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PREFACE 

rpms book has been written primarily for those who have little or no 
previous training in mathematical statistics, but who have some training 
or experience in the presentation and handling of statistical data. It is 
consequently not written in the form of a mathematical treatise, and 
mathematical proofs have not been included. On the other hand an attempt 
has been made to cover all the modern developments of sampling theory 
which are of importance in census and survey work, and to give an adequate 
discussion of the complexities that are encountered in their practical application, 
lhis has necessitated fuller treatment of the subject than is to be found in 
textbooks on mathematical statistics, or than is normally included in statistical 
courses. Indeed, the orderly development imposed by the preparation of a 
ook revealed a number of gaps in current theory which had to be filled in. 
consequently the book should also prove of value to mathematical statisticians 
who are interested in sampling theory and its applications. 

The work had its origin in a request of the United Nations Sub-Commission 
on Statistical Sampling, at their first session held at Lake Success in September, 

,nS\? at , , a ^ manUal be P re P ared t0 assist in the execution of the projected 
]95° World Census of Agriculture, and the 1950 World Census of Population. 
1 he Sub-Commission were particularly impressed with the need for a wider 
use of sampling in the less developed areas, and it was originally intended 
that only the sampling problems encountered in censuses and surveys in these 
areas should be dealt with. On reviewing the matter, however, I came to the 
cone usion that conditions differed so greatly in different areas that it would 
be necessary to cover a wide variety of methods, which in essentials differed 
little from the methods appropriate to censuses and surveys in more fully 
developed areas. It therefore seemed best to take the opportunity of writing 
a more general book. I believe that on balance the course taken will be 
advantageous to those concerned with censuses and surveys in the less developed 
areas, since the modern developments in sampling have been chiefly made in 
conjunction with its application in the more fully developed areas, and they 
can best be explained against the background of the material and problems 
to which they have been applied. 

The various computational procedures have been illustrated, as far as 
practicable by numerical examples. These examples , in the main have an 
agricultural background, since this type of data was most readily accessible 
and is also particularly relevant to the original purpose of the book. For the 
most part the data on which they are based form a small part of the results 
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of much larger surveys. The examples do not in themselves serve as models 
for the reduction of large bodies of data, but once the general principles have 
been grasped no great difficulty should be found in planning this reduction, 
which presents very similar problems to those encountered in the analysis 
of material from complete censuses and surveys. 

I have not attempted to ascribe priority in the discovery of particular 
methods. Indeed, such a task presents almost insuperable difficulties, since 
the methods used in many surveys are not at all fully reported, and the mam 
developments have arisen chiefly through ingenious practical workers devising 
new methods of selection which seemed on commonsense grounds to be capable 
of giving specially accurate results, or appeared to possess other valuable 


properties. • • n 

The book has unfortunately taken longer to prepare than I had originally 
anticipated. This is mainly due to its widened scope and the various theoretical 
investigations that proved to be necessary. It has had to be prepared m 
considerable haste, and I can scarcely hope that it is entirely free from errors 
or omissions, or that perfect balance, has been achieved. Certain minor 
discrepancies in the numerical work arise from the fact that some of the 
calculations have been carried out to more places of decimals than are na y 
reported, and that some of the standard errors have been computed on a slide 
rule. I shall be most grateful, however, if my attention is drawn to any errors 

of consequence that may be discovered. 

I have been much helped by my wife in the planning of the book, and 
I have had considerable assistance in its preparation from various members 
of the Rothamsted Statistical Department. My thanks are particularly due 
to Dr. Rose O. Cashen and Mr. H. D. Patterson for computing and checking 
many of the examples, to Dr. P. M. Grundy and Mr. G. M. Jolly for their 
critical reading of the galley proofs, and to Miss Ruth Hunt and other members 
of the secretarial and computing staff for all their work and assistance. I also 
wish to thank the publishers and printers for the care taken in the preparation 
of this book in spite of the great haste that was necessary. 

I have also received very considerable assistance from the discussions on 
the many and varied aspects of sampling that took place at the first two sessions 
of the United Nations Sub-Commission on Statistical Sampling, and from 
the Secretariat of the Statistical Office of the United Nations, who made 
available a very complete list of references on sampling on which the list given 


in this book is based. 


F. YATES 


Rothamsted Experimental Station, 
1th February , 1949. ^ 
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SAMPLING METHODS FOR 
CENSUSES AND SURVEYS 

CHAPTER 1 

THE PLACE OF SAMPLING IN CENSUS WORK 

1.1 The sampling process 

Sampling, that is, the selection of part of an aggregate of material to 
represent the whole aggregate, is a long-established practice. Simple examples 
are provided by a handful of grain taken from a sack, or a piece of cloth cut 
off a roll. In these cases little attention need be paid to the selection process, 
since the whole of the material is similar or well-mixed, and any part of it 
if not too small is likely to be closely representative of the whole. When, 
however, the aggregate to be sampled consists of units which are somewhat 
dissimilar amongst themselves, and which are not well-mixed, a small sample 
of these units may not be representative of the whole aggregate. Even if units 
are selected from different parts of the aggregate, and other suitable precautions 
are taken, the sample is likely to a certain extent to be unrepresentative owing 
to the chance inclusion of an undue proportion of units of a particular type. 
It will clearly not be representative if units of a particular type are chosen 
deliberately to the exclusion of other types, or if the process of selection is 
such that certain types of unit are favoured at the expense of others. Thus 
in sampling a heap of coal by taking a few shovelfuls from the edges, too great 
a proportion of the large lumps will be obtained, since the large lumps tend 
to roll down the sides and be distributed round the edges of the heap. 
Similarly in the sampling of continuous material, a single portion, even if 
quite large, may not be adequately representative ; a piece of cloth cut off 
the end of a roll in which the quality of the weaving varies progressively, will 
not form an adequate sample of the whole roll. 

Census and survey work is normally carried out on material made up of 
dissimilar units. Censuses of population, censuses of industrial production, 
and censuses of agriculture have the common feature that the aggregate of 
material embraces a large number of separate units which are often markedly 
dissimilar in various respects. In many cases the purposes for which the 
information is required are adequately served if a proportion only of the units 
are covered, but because of the dissimilarity of the different units neither 
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SECT. 1.2 SAMPLING METHODS FOR CENSUSES AND SURVEYS 

haphazard nor casual selection, and still less deliberate selection, can be 
expected to provide a representative sample. Rigorous processes of selection 
have therefore to be used. 

Censuses carried out on a properly selected sample will be called sample 
censuses. There has in the past been a tendency to use the term sample to 
refer to the results of an attempted complete census in which there has been 
failure to obtain information from a substantial proportion of the units. Its 
use in this sense is strongly to be deprecated ; instead the term incomplete 
census is suggested. The term sample should be reserved for a set of units 
or portion of an aggregate of material which has been selected in the belief 
that it will be representative of the whole aggregate. 

1.2 Sampling errors 

Whether or not a sample will give results which are sufficiently 
representative of the whole aggregate depends primarily on whether the errors 
introduced by the sampling process are sufficiently small not to invalidate 
the results for the purposes for which they are required. Even if a proper * 
process of selection is employed, the sample cannot be exactly representative 
of the whole aggregate. The inevitable errors which then occur in the 
results are termed the random sampling errors of these results. The average 
magnitude of these random sampling errors will d epend on the size of the 
sample, on the variability of the material, cm the^sampling procedure adopted, 
a nd on the way in which the res ults are calculated. 

It is a fortunate fact that if a proper process of selection is adopted, the 
average magnitude of the random sampling errors, and indeed the expected 
frequency of occurrence of errors of any magnitude, can be calculated from 
the detailed results obtained from an actual sample. The methods by which 
this can be done depend on the mathematical theory of statistical sampling. 

An extension of the analysis involved in the calculation of these errors 
enables the relative accuracy of the different sampling methods which can be 
employed on the same material to be assessed, and thus enables further surveys 
to be more efficiently planned. 

It is the development of these processes that has changed sampling from 
a speculative and uncertain procedure to a method having definite and 
determinable precision. Sampling has thus become a reliable method in which 
full confidence can be placed. In addition, the possibility of setting ascertainable 
limits to the random sampling errors has served to throw into prominence 
those other types of error which arise from faulty selection processes or faulty 
methods of observation, or which exist in some other source of information 
with which the sampling results are being compared. 

1.3 The place of sampling in census and survey work 

Sampling will only be of use in census work if, as mentioned in Section 1.2, 
the sampling errors are sufficiently small not to affect the validity of the results 
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for the purposes for which they are required. This will in part be a function 
of the degree to which the results have to be broken down. If only overall 
results for the whole population are required, a given degree of accuracy will 
be attained with a far smaller sample than will be the case if detailed results 
for different parts of the population ( e.g . different regions, towns, etc.) are 
required. In certain circumstances the sample may have to be so large that 
there will be little point in using a sample census in place of a complete 
census. Obviously, in the extreme case where information on all the individual 
units is required, this can only be obtained by a complete census. 

Another factor which influences the decision whether or not to use sampling 
is the relative difficulty an d cos t of organizing a sample census and a complete 
census. The amount of efforf and expense required to collect information 
is always greater per unit for a sample than for a complete census. In addition 
a sample census presents its own organization problems, some of which are 
absent from a complete census, and it occasionally happens, if the information 
required is very simple, that a complete census can be carried out through the 
ordinary administrative channels, whereas a sample census requires the 
setting-up of a separate organization. Usually, however, if the size of the 
sample needed to give the required accuracy represents only a small fraction 
of the whole population, the total effort and expense required to collect the 
information by sampling methods will be very much less than that required 
for a census of the whole population. 

In many cases, therefore, samplin g re s ults iq great economy of effort. 
It has also other advantages which* are not so immediately apparent. In the 
first place, the completeness and accuracy of the returns may be much more 
easily ensured if the information is collected from only a small proportion 
of the population. If, for example, questionnaires are sent through the post, 
it is frequently impossible in a complete census to bring pressure to bear, on 
those who fail to make their returns, even where the completion of these 
questionnaires is compulsory, owing to the large numbers of individuals 
involved. In the case of a sample, the smaller number of individuals enables 
follow-up notices to be sent and telephone calls and visits to be made. The 
separate returns can also be much more carefully scrutinized, and further 
enquiries undertaken where there is reason to doubt their accuracy. 

Secondly, it is possible to obtain more detailed information in a sample 
census. Although the burden on the individual of furnishing more detailed 
information is not lessened, except when different items of information can 
be obtained from different individuals, the individuals concerned are more 
likely to be willing to provide such information if they know that they represent 
a small sample of the whole population. Detailed information, when obtained, 
can be more easily handled, both at the stage of abstraction and coding of the 
original information and in the analysis of the coded results. Owing to the 
reduced volume of material that has to be handled the quality of the abs tegtiop 
and anal ysis can also be improved, the former because a higher grade of clerical 
labour can be employed, with better supervision, and the latter because the 
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data can be, classified in many more ways with the same amount of computing 
or machine time. 

Thirdly, in many types of census the use of sampling makes possible 
a very considerable increase in jpeed , both in the execution of the field work, 
and in the analysis of the results. Speed in analysis can also be obtained, in 
the case of a complete census, by taking a sample of the returns for abstraction 
and analysis. This device is frequently of value for providing preliminary 
results quickly, even when a final analysis of the whole of the returns is 
ultimately, required. 

The use of sampling is essential for investigations of the sociological type 
in which extensive and detailed information has to be collected from individuals, 
many of whom have neither the education nor experience required to answer 
detailed questionnaires without assistance. It is equally essential in 
investigations requiring skilled physical observations and measurements. 
Such investigations can only be carried out by the use of trained investigators, 
and complete investigations covering any large group of the population or body 
of material are consequently impossible, both on grounds of expense and 
because, even if the expense can be tolerated, a sufficient body of investigators 
can rarely be recruited and trained. 

For an investigation of this kind involving the collection of elaborate 
information the term survey is usually employed. It seems a mistake, however, 
to confine the word survey to a sample survey or the word census to a complete 
census. Thus B. Seebohm Rowntree (1901), when he carried out an 
investigation into the social and economic conditions of all working-class 
families in York, correctly described this as a survey. 

Although the use of sampling necessarily introduces certain inaccuracies, 
owing to sampling errors, the results obtained by sampling are frequently 
more accurate than those obtained in a complete census or survey. The 
random sampling errors are always assessable. The other errors to which a 
survey is subject, such as incompleteness of returns and inaccuracy of 
information, are liable to be very much more serious in a complete census 
than in a sample census, since far more effective precautions can be taken, 
to see that the information is accurate and complete in a sample census. 
Furthermore, the use of sampling greatly facilitates the imposition of additional 
more detailed checks. Indeed, a complete census can only be properly tested 
for accuracy by some form of sampling check. 

On the other hand, the claim that is sometimes made that the reliability 
and accuracy of the results of a properly planned sample census can be 
assessed with full objectivity from the results themselves is only partly true. 
The random sampling errors can be so assessed, and under certain circumstances 
it is possible to obtain comparisons between different investigators. If all 
investigators or respondents tend to make the same kind of error, however, 
this will not be revealed in the results, whether the census is complete or carried 
out on a sample. 

In respect of coverage a sample census may in certain circumstances be 
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less reliable than a complete census. It is, for example, relatively simple for 
an investigator to ascertain by direct question whether an individual has already 
been included in a population census, and simple intensive checks of certain 
areas, say villages, can be made in a similar manner to verify that there is no 
appreciable number of omissions. Similarly, in a survey of physical objects 
such as houses, a marking system or other suitable device can often be used 
to guard against duplication and omission. Such checks are impossible in 
the case of a sample census. r 

This is one of the most difficult points in the practical design of many 
sample surveys, particularly in undeveloped areas. To overcome it complete 
enumeration can sometimes be used in conjunction with sampling. Where a 
complete enumeration of the whole of the population or aggregate of material 
presents no particular difficulty, but where the collection of detailed information 
from all units would be a difficult or impossible undertaking, a complete 
enumeration can be carried out. This is then used as a basis for the selection 

.he deSd Wo'ir’’ ““ ant ' SUrVey ’ “ “ require<1 “ 

1.4 Development of the use of sampling in censuses and surveys 

Prior to the development of the appropriate methods of estimation of 
a “ P '" g e J ror ® and a c ' lear recognition of the conditions governing satisfactory 

wn,lr° d ff ° f Se ectl ,° n ° f th ! sam P Ie - the use of sampling in census and survey 

rk often proved unsatisfactory. There are many early examples of sample 
censuses and surveys which are defective in one way or another. Even when 
the basic principles of the simpler forms of sampling were understood, the 
attempted use of more complicated forms before methods of evaluating their 

errors and relative efficiency had been worked out gave rise to further defective 
Surveys. 

This has led to a certain mistrust of sampling, which still exists in some 
quarters. During recent years, however, there has been a rapid growth in 
the use of samp mg in various countries. This development has been greatly 
stimulated by the war and its attendant measures of large-scale economic 
control Such measures, if they were to be effective in the changing conditions 

L r V’\ Wartlme ;. demanded an efficient and s P^dy information service 
which only the sampling method could supply. This has resulted in further 
improvements m technique through the stimulation of research into the theory 
of sampling methods, and the provision of basic data for practical investigations 
ot the relative efficiency of the various methods in different fields. 

It still remains true, however, that in inexperienced hands sampling mAy 
give unsatisfactory results, owing to the use of faulty methods of selection 
inappropriate sampling design, or inefficient methods of estimation. The prime 
requirement of any large-scale sample survey is therefore that the organization 
of the survey should be carried out by a person who has adequate knowledge 
d experience of sampling methods and their application. The methods 
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employed must be thoroughly sound, theoretically and practically, both m 
order that satisfactory results may be ensured and also in orderthat mistm 
cannot subsequently be engendered by criticism of the methods adopted 
It must never be forgotten that it is not sufficient to provide results which are 
in fact correct. They must also be generally accepted if they are to have their 

fUl1 It ^ sometimes stated that no large-scale sample census or survey should 
be carried out without the advice of an expert mathematical statistician with 
experience of such work. Unquestionably, if the services of an expert 

can be secured this is all to the good, but my own experience is that no one 
expert can be expected to supervise adequately more than a veryfew surwys 
at any one time, since adequate supervision demands a very full knowledge 
both of the material that is to be surveyed and of local conditions, coup 
with close attention to detail at all stages. An expert acting m an advisory 
capacity is therefore no substitute for the statistician on the spot, who must be 
prepared to accept responsibility for the planning, execution and analysis of 
the P survey. To do this he must himself have both an adequate knowledge 
of sampling procedure and thorough knowledge of the material and local 

^Consequently, if full and effective use is to be made of sampling methods, 
statisticians and others who already have experience of the conduct of co ™P let ® 
censuses but no training in sampling methods must themsdves under ake a 
study of these methods, in order that they may decide in what ways these can 
be applied to their own problems. The function of the expert then becomes 
one of advice on exceptional problems, rather than one of detailed supervision 
Fortunately the principles underlying good sampling methods are not 
unduly difficult to understand, and provided a proper respect is observed or 
the fundamental rules of procedure I believe they can be successfully applied 
by those who have statistical experience but who are not primarily mathematical 

statisticians. 


1.5 Method of presentation 

The method of presentation adopted in this book is to take the various 
parts of the sampling process in roughly the order they are encountered m 
the execution of a census or survey, and discuss the various aspects of each 
part in turn. Thus Chapters 2 and 3 describe the various types of sample 
that can be used, and the general principles to be followed in the selection 
of a sample, Chapter 4 deals with the practical planning of ^ survey, and 
Chapter 5 with the problems encountered in its execution and in the abstraction 
of the results. The remaining chapters are concerned with the more strictly 
statistical problems. Chapter 6 deals with the various methods of estimating 
the population values, Chapter 7 with the estimation of sampling errors and 
Chapter 8 with the determination of the relative efficiency of t e v 
sampling methods. This method of presentation has the advantage that the 
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more practical aspects of sampling procedure are dealt with first. It is true 
that knowledge of the statistical techniques described in the later chapters is 
necessary before the relative merits of different methods of sampling any 
particular type of material can be accurately assessed. The detailed application 
of these techniques, however, is the province of those responsible for the 
numerical analysis of the results, whereas the planning is also the concern 
of those who require the information and those who are concerned with its 
collection. The planning can be undertaken much more efficiently, and with 
added interest, if all concerned understand in general terms the underlying 
problems. It is hoped that study of the first five chapters will give this under¬ 
standing. If they also act as a stimulus to the study of Chapter 6 and the 
first few sections of Chapter 7 the understanding should be correspondingly 
deepened. 

For those responsible for the numerical analysis of the results, and for 
the assessment of the relative efficiency of the different possible methods, 
thorough study of the whole book is necessary. This study should include 
the reworking of the numerical examples. Only by this procedure can a' 
thorough grasp of the details of the various methods be obtained. 

The separation of the discussion of the methods of estimation of the 
population values, of the sampling errors, and of efficiency necessarily involves 
a good deal of cross-reference, particularly in the numerical examples. Since 
this appeared inevitable, it was with some hesitation that the chosen method 
of presentation was adopted. On balance, however, this disadvantage appeared 
to be outweighed by the advantage of being able to present as a whole the 
relatively simple techniques involved in estimation before the more complicated 
techniques required for the estimation of error and the assessment of relative 
efficiency. It is believed that this will make the book more useful to those 
who do not require to go deeply into these latter techniques. For those who 
prefer it, there is nothing to prevent the simultaneous study of the corresponding 
sections of Chapters 6 and 7, or indeed of Chapters 6, 7 and 8. Chapters 6, 

7 and 8 may also, if desired, be taken before Chapters 4 and 5. 

1.6 Terminology and notation 

The question of terminology was considered by the United Nations Sub- 
Commission on Statistical Sampling, at its second session held in Geneva in 
September, 1948. Their recommendations are included in a memorandum 
entitled Recommendations concerning the Preparation of Reports on Sampling 
Surveys. With a few minor exceptions the terminology adopted in this book 
is that recommended by the Sub-Commission. 

New conventions have been adopted for the mathematical notation. The 
use of bold face and Gill Sans type for population values and their estimates, 
and of capital letters for the population totals, has enabled the formula to be 
presented in a very simple, and it is hoped easily understandable form. By 
the use of this notation the elaborate summation notation which has become 
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current in much of the literature on sampling has been avoided. It is 
recognized that the notation is not particularly convenient for manuscript 
and typescript, but the difficulty can in fact be overcome by the use of single 
and double underlining, with the corresponding verbal descriptions of 
“ sub-bar ” and “ sub-double-bar.” 



CHAPTER 2 

REQUIREMENTS OF A GOOD SAMPLE 

2.1 Bias 

The principal object of any sampling procedure is to secure a sample 
which, subject to limitations of size, will reproduce the characteristics of the 
population, especially those of immediate interest, as closely as possible. 

At first sight it might appear that the most accurate results could be 
obtained by deliberate selection of the units to be included in the sample. 
In particular, if averages only are of interest, units might be selected which 
appear to be nearest to tfie average. If, for example, a quick assessment of 
the yield per acre of an agricultural crop is required, district officers might 
be asked to select some “ average ” fields in each district, and to determine 
the yields of these fields. 

Such a sample is unfortunately very often of little value. Its primary 
fault is that it may well be biased , that is, the selection of all the fields may 
be affected by similar errors. Thus, in order to enhance the reputation of 
their districts, all district officers may tend to select fields which yield more 
eavily than the average, or, if they feel that the interests of the farmers or the 
country may be furthered by an underestimate, they may select fields which 
yield less than the average. 

Even if the district officers can be trusted to be completely objective 
considerable unconscious errors of judgment, all tending in the same direction’ 
may still occur, and such errors may far outweigh any increase in accuracy 
resulting from deliberate selection. Nor will increase in the number of officers 
concerned m the selection necessarily improve matters, since all may be subject 
to the same type of error. 

We may consequently distinguish between two, types of jampling error, 
*22§£iH§jSg fr .°m biases in selection, etc., and those due to change differences 

population included in the sample andlhosiT 
The aggregate of the former in the sample will be termed the 
grroy f 0 f ()s and the aggregate of the latter the random sampling error , or 
when bias is known to be absent, the sampling error. The total sampling error 
will, of course, be made up of the bias, if any exists, and the random sampling 
T “ e essence of bias is that it forms a constant component of error 
which does not decrease, in a large population, as the number in the sample 
increases whereas the random sampling error decreases on the average as 
the number in the sample increases. 

2,2 Methods of selection which give rise to bias 

There are a number of ways in which faulty selection of the sample may 
give rise, to bias. The main causes may be broadly classified as follows 

^ ‘‘ rep resentative ” sample. This is the type 

of bias described above. .~ —". .. 
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(2) A procedure of selection de pending 

correlated with propert ies of the unit which are o^jnterest 4 Many 
hapiazardTelection processes give rise to biases of this kind. 

(3) Co nscio us j)XJ^c a ?<< ran( ^ Q p — 

"if a^proper random process is not strictly adhered to, the investigator, 
although claiming that his sample is random, may allow his desire 
to obtain a certain result to influence his selection. This type of bias 
is particularly serious, since its existence may not be immediately 
apparent. 

(4) Substitution. Investigators often substitute 

of the population whe n diffi culties are, ejicpunte 
information. Thus, in a house-to-house survey the next house 
may be taken when there is no reply. This will necessarily lead to 
a preponderance of houses of the type that are occupied all day, 
e.g. houses of people with families. 

(5) Failure to cover the whole of the c hosen sample. If no second visit 

irmadel^houses'^Si 4 which received there will still 

be bias even though no substitution is attempted. This fault is 
particularly prevalent in postal questionnaires, which are often very 
incompletely returned. Returns are clearly likely to be received 
from individuals who are specially interested in the objects of the 
survey, or possess other characteristics which make them 
unrepresentative of the whole population* 


2.3 Avoidance of bias in selection 

It is clear that, if possibilities of bias exist, no fully objective conclusions 
can be drawn from a sample. The first essential of any sampling procedure 
must therefore be the elimination of all important sources of bias. 

The simplest,only universally certain way, of ayojdmg bias in 
the selection process is for the sample to be drawn eithexeptkely at random , 
oTTT^mndom "subject to restrictions^^hich, while improving the accuracy, 
are o f such aHnaturelth at they do not introducebias into the results. In some 
cases, however, certain forms of systematic selection, such as the selection of 
names at equal intervals down a list, or the use of an evenly spaced grid of 
points on a map, may be permissible. 

Random selection does not mean haphazard selection. A random sample 
can only be obtained by adherence to some proper random process, such as 
the drawing of lots or the use of a table of random numbers. Sticking pins 
into a map will not give a random distribution of points in a map. The 
selection of houses by walking through the streets of a town will not give 
a random selection of houses in the town. The words “ random and 
“ random sample ” are, in fact, gravely abused. For this reason, if for no 
other, the method of selecting the sample should be specified in all accounts 
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of the results of sample surveys and censuses, and indeed in all sampling work. 

In order to prevent careless or deliberately biased selection on the part 
of investigators it is often important in large-scale work for the selection to 
be done in some central office, in such a manner that no element of choice is 
left to the investigators, and in such a manner also that checks on the field 
work can be imposed if necessary. Even in cases in which a less rigorous 
method of selection may be judged to be satisfactory, it may be necessary 
to impose a rigorous method in order to prevent criticism on this ground by 
those not familiar with the details of the work. 

2.4 Examples of biased selection 

It may be well at this stage to give some actual examples of cases in which 
an unsatisfactory method of selection has introduced serious bias into the 
results. 

The first example is taken from a paper by Kiser (1934, D). A sample of 
households was taken in Syracuse, IJ.S.A., in 1930 and 1931, with the object 
of making a study of morbidity. It was also intended to use this sample for 
tlie study of birth-rates. Before beginning this latter study, which was 
subsidiary to the morbidity study, a comparison was made of the sizes of 
households of the sample with those of the corresponding census tracts. This 
comparison is shown in Table 2.4.a. (Households of one were not included 
in the survey.) 


Table 2,4.a—S ample of households in Syracuse: distribution of 

HOUSEHOLDS ACCORDING TO SIZE, IN THE ORIGINAL SAMPLE, AND* IN THE 
CENSUS TRACTS 


Number in 
household 

Original sample 

Census tracts t 

Number 

Per cent. 

Number 

Per cent. 

2 . 

254 

19-4 

1,762 

26-8 

3 . 

338 

25-9 

1,745 

26-5 

4 . 

307 

23-5 

1,438 

21-9 

5 . 

201 

15*4 

853 

13*0 

6 . 

106 

8-1 

388 

5-9 

7 

46 

3-5 

208 

3*2 

8 . 

25 

1-9 

96 

1*5 

9 and over 

29 

2-2 

86 

1*3 

Total 

1,306 

99*9 

6,576 

100*1 


It is immediately apparent from the table that the sample contains a 
considerably greater proportion of large households than exist in the whole 
population. Households of two are under-represented in the sample to the 
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extent of 7*4 per cent, of all households, or 28 per cent, of the households of 
this size. This deficiency is attributed by Kiser to the failure of enumerators 
to revisit missed households, in which childless married women working away 
from home are likely to predominate. In order to provide a more satisfactory 
sample it was necessary to make a further survey of those families that were 
missed altogether at the time of the morbidity survey. 

It is interesting to note that the sample was apparently considered 
satisfactory for the morbidity study, as is indicated by the statement that the 
workers “ had been primarily concerned with securing a sample representative 
of the area in regard to prevalence of sickness rather than size of household.’* 
Actually such a biased sample can scarcely be regarded as wholly satisfactory 
for even a morbidity study, since sickness rates are likely to vary with the 
size and composition of the family. 

The second example is one obtained at Rothamsted in an experimental 
sampling of a collection of stones (Yates, 1936, b , H). The stones, a number of 
flints of varying sizes, some 1200 in all, were spread out on a table, and twelve 
observers were each instructed to choose three samples of twenty stones which 
should represent as nearly as possible the size distribution of the whole 
collection. Table 2.4.b gives the mean weights per stone of these 36 samples, 
and also the true mean weight of the whole collection. 


Table 2.4.b— Mean weight per stone in samples of 20 stones ( oz ) 


Observer 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Sample 1 

1*9 

2-4 

2-4 

1-9 

2-2 

2-8 

2-4 

1-6 

2-2 

2*6 

2*4 

2-4 

Sample 2 

1 -8 

3-0 

2*4 

2-0 

2-7 

2-6 

2-6 

2-0 

2-2 

2-2 

2*4 

3*0 

Sample 3 

1*7 

2-4 

2-1 

2-0 

3-1 

2*8 

2*5 

2-0 

2-2 

3-1 

1*8 

2*4 

Mean 

1*8 

2-6 

2-3 

2-0 

2*7 

2-7 

2-5 

1-9 

2-2 

2-6 

2-2 

2-6 


Mean of all samples : 

2-34 

oz. 

True 

mean : 

1-91 

oz. 




It is apparent that there is a tendency, which is common to most observers, 
to select stones which are on the average larger than those of the whole 
collection. Of the twelve observers ten chose samples whose mean weight 
was above the mean weight, 1*91 oz., of all the stones, the mean for all samples 
being 2*34 oz. This tendency is consistent from sample to sample. Thus, 
of the thirty samples chosen by the above ten observers, all but two had mean 
weights greater than the mean weight of all stones, while all three samples of 
observer 1 were less than the correct mean. 

In this example the selection was deliberate. A further example showing 
similar effects arising from haphazard selection (claimed by the observer , to 
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/ ; 

be “ random ”) is provided by some observations obtained in the course of 
a scheme of sampling observations on the growth of wheat instituted by the 
Agricultural Meteorological Committee (Yates, 1935, A). 

In this scheme measurements on the heights of shoots of wheat were made 
at regular intervals on observation plots at a number of centres. A detailed 
procedure had been laid down for the random location on each occasion of 
128 quarter-metre lengths of row in sets of 4 on contiguous rows. The height 
measurements were made on the 256 shoots at the ends of these lengths—test 
observations conducted at another time indicated that this method of selection 
was virtually random. At one centre a drill with fewer rows than normal 
had to be used, and as a result only 192 shoots were available for measurement 
on each occasion. In order to provide the number of observations laid down 
and thereby, as he thought, improve his results, the observer selected “ at 
random ” two additional shoots from each set of three quarter-metre 
lengths. Fortunately he booked the observations on these additional shoots 
separately. 

Figures 2.4.a and 2.4.b show the distribution of the regular and additional 
measurements taken on the 31st May and on the 28th June respectively. The 
deviations from the set means of the regular measurements are shown. 
Suitable adjustments, details of which are given in the original paper, are 
made to the additional measurements to give fair representation of the 
variability as well as the bias in the mean. 

Examination of Figure 2.4.a indicates that on this date the additional 
measurements show a considerable preponderance of positive deviations with 
a corresponding deficiency of negative deviations. There is, in fact, a tendency 
to select shoots which are higher on the average than those of a truly random 
sample, the difference in the average height being -fi 3*3 cm. This difference 
is clearly in the nature of a bias, and cannot be attributed to random sampling 
errors. ; 

The situation was entirely different on the 28th June, as is shpwn in 
Figure 2.4.b. At this date the deviations of the additional measurements, 
both positive and negative, are smaller on the average than are those of the 
regular observations ; in other words, there is a tendency to select shoots 
which are nearer the mean height than they would be on the average in a truly 
random sample. In spite of this, there is again a considerable bias, this time 
negative, the mean difference being —2*7 cm. In this case, therefore, a single 
additional shoot will give a value which on the average is closer to the true 
mean value than is the value given by a single randomly located shoot, but 
as the number of shoots is increased the relative accuracy of the random sample 
progressively increases, and with the numbers of shoots actually taken, the 
random sample is considerably more accurate. 

This example provides an illustration of a case where the biases on the 
two occasions, though arising from similar defects in selection, are of very 
different magnitude, and indeed of opposite sign. Consequently the difference 
of the two sets of measurements will also be seriously affected by bias. In 
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SCALE OF DEVIATIONS FOR REGULAR OBSERVATIONS (cm) 

Fig. 2.4.a—D istribution of regular observations { shaded ) and additional 
observations ( unshaded ) of heights of wheat shoots on 31st may 

(By courtesy of the editor of the Annals of Eugenics.) 



Fig. 2.4.b—D istribution of regular observations { shaded ) and additional 
observations { unshaded ) of heights of wheat shoots on 28th June 

(By courtesy of the editor of the Annals of Eugenics.) 
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this case the growth rate of the wheat would have been underestimated by 
nearly 10 per cent, had only the additional measurements been available. 

These biases are, of course, of the type that might be expected. When the 
shoots are only half-grown and there is nothing much to be seen except the 
top leaves there will be a tendency to pick the longer shoots, but when the 
crop has come into ear the observer can see shoots of all lengths, and is more 
likely to select shoots somewhere near the average, omitting both very long 
and very short shoots. The strong negative bias of the last set of measurements 
shows that this selection was not particularly effective in improving the 
accuracy of the sample. 


2.5 Bias arising from faulty demarcation of the sampling units 

Any consistent errors in measurement will clearly give rise to bias, 
whether the measurements are carried out on a sample or on all the units of 
the population. The danger of such errors is, however, likely to be greater 
in sampling work, since the units measured are often smaller. Furthermore, 
the knowledge that had another sampling unit by chance been selected a very 
different value might have been obtained, may lead the inexperienced worker 
to believe that accuracy in the measurement of the selected units is of little 
importance. 

When the sampling units are not natural units of the population, the selected 
units usually have to be demarcated at the time the measurements are taken. 
In crop sampling work in particular, where small areas are selected in order 
to obtain an estimate of the yield or other characteristics of the crop, location 
of the areas by means of randomly selected co-ordinates, though theoretically 
ensuring a random sample, will only in practice do so if the field work is carried 
out with complete objectivity. Since it is impossible in practice to locate the 
areas according to their co-ordinates by means of exact measurements, pacing 
or some similar approximate method must be used. 

In this type of work the areas themselves should not be too small, both 
because errors in the demarcation of the boundaries become of increasing 
importance as the size of the unit is decreased, and also because the possibility 
of influencing the results by small changes in location, e.g . so as to include 
a particularly good plant, is greater the smaller the unit areas. Very small 
areas are capable of giving completely reliable results with experienced and 
well trained field-men, but may be very unreliable when used by inexperienced 
workers, particularly if the need for complete objectivity is not appreciated. 

Sukhatme (1946a, H), for example, has reported the biases shown in 
Table 2.5 in some trial crop-sampling work on wheat. He himself expresses 
the opinion that the biases of the very small areas are due to the inclusion of 
border plants. This, however, would imply that the effective radius of the 
smallest areas, which were nominally circles of 2 ft. radius, would have to be 
increased by nearly 5 inches. Errors of this magnitude appear improbable, 
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unless the observers were very careless in their work, and it seems likely that 
part at least of the bias has been caused by faulty location. 

Eye estimates are themselves a form of measurement, but such estimates 
are always subject to bias, which is likely to vary from observer to observer, 
and is often very substantial. If eye estimates are used, steps must therefore 
be taken to eliminate the resultant biases by carrying out proper measurements 
on a sub-sample of the material. A simple example of this is provided by the 
1938-9 Census of Woodlands described in Section 4.25 and Examples 6.12.b 
and 7.11. A more complicated example is discussed in Sections 6.15 and 7.14. 


Table 2.5 Bias in the use of small-size areas in sample surveys for 

yield (Sukhatme) 


Size of area 
in sq. ft. 

No. of 
areas 

Average yield in 
mannds per acre 

Percentage 

overestimation 

I Irrigated 




471-5 

78 

10-10 


117-9 

78 

10-58 

4-8 

29-5 

78 

11-69 

15-7 

28-3 

117 

11-60 

14-9 

12-6 

117 

14-38 

42-4 

Unirrigated 




471-5 

107 

6-55 


117-9 

107 

7-27 

11-0 

29-5 

107 

8-08 

23-4 

28-3 

162 

7-52 

14-8 

12-6 

161 

9-33 

42-4 


2.6 Bias in estimation 

In addition to biases which arise from faulty processes of selection and 
faulty work during the collection of the information, faulty methods of 
analysing the results may also introduce bias. A simple example occurs in 
the estimation of ratios. If, for instance, an agricultural crop is grown on 
types of land with different levels of fertility, and if the fields on the different 
types of land are of different average size, the mean yield per acre estimated 
from the mean of the yields per acre of all the fields may be" markedly different 
from the mean yield per acre of all the land growing the crop. To take a 
numerical example, if there are three types of land having average yields of 
20 cwt., 15 cwt. and 10 cwt. per acre respectively, and fields of an average 
size of 5 acres, 10 acres and 15 acres respectively, the number of fields on each 
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type of land being the same* the mean yield per acre over the whole of the 
land will be given by the weighted mean 

5x20+10x15 + 15x10 

5 1Q + 15 -= 13f cwt. per acre 

whereas the mean of the yields per acre of all fields will be 15 cwt. Consequently 
the bias in the estimate by the latter process will be about 12 per cent. 

Biased estimates can be avoided by using the proper processes of estimation. 
This, matter will be dealt with more fully in Chapter 6 . 

2.7 Circumstances in which bias is permissible 

Although avoidance of any substantial bias is usually of the utmost 
importance, particularly in censuses on which administrative action has to be 
based, absence of bias is not always essential. In some types of investigation 
a certain amount of bias, provided it is reasonably constant, can be accepted. 
In censuses which are repeated at frequent intervals with a view to determining 
the changes rather than absolute values, for instance, a small overall bias may 
be of little consequence, provided it is constant in time. Similarly in surveys 
which have as their main objective the comparison of different groups of the 
population a bias which is approximately constant from group to group will 
be of little importance. The investigator must also avoid attaching exaggerated 
importance to minor sources of bias which, in fact, can only produce errors 
which are trivial relative to the random sampling error. 

2.8 Methods of reducing the random sampling error 

Once the absence of any important bias has been ensured, attention can be 
turned to the random sampling errors. These must clearly be sufficiently 
small to achieve the accuracy required. 

Apart from errors due to bias, the simplest way of increasing the accuracy 
of the sample is to increase its size. Other things being equal, the random 
sampling error is approximately inversely proportional to the square root of 
the number of units included in the sample. 

The accuracy attained will, however, depend not only on the number of 
nnits included in the sample, but also on the variability per unit ; or, more 
strictly, on that part of the variability per unit which contributes to the 
sampling error. It is here that the complications of sampling procedure, both 
of design and of subsequent analysis, arise. By suitable processes of selection, 
restrictions on fully random “selection donot introduce 
bia s int Q the results, the part of the variability per unit’w+!ch contributes to 
the .sampling error can often be ..substantially reduced, andAhe size'of the 
sample required for a given accuracy thereby diminished. 

The simplest type of restriction is that known as stratification . The 
population is “ stratified ” or divided into blocks of units in such a manner 
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that the units in each stratum or block are as similar as possible. Each of 
the strata is then sampled at random. If the same proportion is taken from 
each stratum, it is clear that each stratum will be represented in the correct 
proportion in the sample, and consequently differences between different strata 
are eliminated from the sampling error. 

In addition to stratification there are a number of other devices, which 
will be discussed in more detail later, by which the accuracy of the sampling 
procedure can be increased, often very substantially. The three most important 
are : utilization of supplementary information, use of a variable sampling 
fraction (sometimes called “ optimal allocation ”) and multi-stage sampling. 

Utilization of supplementary inf ormation , that is information which is derived 
from sources other than the sampling scheme, or from a more extended sample 
than that on which information on the main characters is collected, takes a 
number of forms. A simple example will illustrate the general principle. 
Suppose that an estimate of the wheat yield of a country is required, and that 
a random sample of wheat fields has been taken and the total yield of each 
field determined. We can then estimate the total wheat yield of the country, 
either (a) by multiplying the total yield of the sample by the reciprocal of the 
proportion of the fields included in the sample, or ( b ) by calculating the mean 
yield per acre of the sampled fields (by dividing the total yield of all the sampled 
fields by their total area, so as to avoid bias) and multiplying this mean yield 
by the total acreage of wheat in the country. The latter estimate can only be 
made if the total acreage of wheat in the country is already known with sufficient 
accuracy, e.g. from returns made by the farmers or from a larger sample. 
If this information is available the second estimate is likely to be considerably 
more precise than is the first, since the variability of the total yields, which 
in so far as the yield per acre is constant will be proportional to the areas of 
the individual fields, is likely to be considerably greater than the variability 
of the yields per acre of the individual fields. 

The use of a variable sampling fraction , i.e. the inclusion of different 
proportions of the different strata in the sample, enables the more important, 
or more variable, parts of the population to be sampled more intensively. 
If this is done it will of course be necessary to weight the contributions of the 
different strata to the total in the correct proportions. 

The optimal sampling fractions depend on the relative variability of the 
different strata into which the population is divided for the purpose of taking 
a sample. Thus, if it is required to determine the number of workers in a 
given industry, it will be better to take a much larger fraction, possibly all, 
of the large factories than of the smaller factories. 

In multi- stage sampling the population is divided into a number of first-stage 
sampling units, which are sampled in the ordinary manner, the selected 
first-stage units being subdivided into smaller second-stage units, which are 
also sampled. Further stages can be added if required. Thus, for example, 
in a population survey, a sample of all towns and villages may be taken, and 
in each of the selected towns and villages a sub-sample of all households may 
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be taken, with, possibly, for certain purposes, a further sub-sample of individuals 
from the selected households. 

23 Choice of unit 

In some classes of material there is considerable choice in the type and size 
of sampling units, and this gives further scope for increase in the efficiency 
of the sampling procedure. In general, when.a given proportion of the material 
is included in th e sample, the smaller the sampling units employed, the more 
accurat e an d representative will be the results . Thus, for example, in an 
agricultural survey, it wifi, be more accurate to take 10 per cent, of all farms in 
each parish, or other small administrative unit, than to take all the farms in 
10 per cent, of the parishes. This will remain true even if multi-stage 
sampling is adopted. It will be more accurate, for example, to take 10 per cent, 
of all the parishes in each county, with a second-stage sampling of the farms 
of each selected parish, rather than to take all the parishes in 10 per cent, of 
the counties, with the same degree of second-stage sampling. The reason 
for this is fairly obvious. The units in any region are likely to be more alike 
than are those of different regions, and if regions are used as sampling units, 
all the units in a region will be included or excluded from the sample 
simultaneously. 

This need for small units distributed over the whole of the population 
often conflicts with the administrative requirements. It is clearly easier to 
arrange for a survey of farms in compact areas, such as parishes or counties, 
than to have to survey the same number of farms scattered over the whole 
country. The choice of a suitable balance between these two conflicting 
requirements is often one of the main problems in the planning of a sample 
survey. Furthermore, if only a small number of large units are included in 
the sample, whether or not there is second-stage sampling of these units, the 
sampling error will not be well-determined, since there will be relatively few 
differences between units on which to base the estimate of this. 

We see, therefore, that the choice of sampling method depends not only 
on the relative accuracy of the different methods, but also on their practical 
convenience. It is important, for example, that the p rocess o f selection of the; 
sample should not involve excessive preliminary work in the form of mapping, 
etc. The most suitable sampling method will therefore depend very much 
on the type of information that is already available on the population to be 
sampled ; a method which may be excellent for a country where good maps 
are available may be entirely useless in a country which is inadequately mapped. 
Again, it is important that not only should the collection of the information 
not involve excessive travelling, but also it should be possible to subject the 
field-workers to proper supervision : consequently, sampling procedures which 
may be excellent with postal questionnaires may be entirely unsatisfactory 
when the information is collected by special investigators. 
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CHAPTER 3 


THE STRUCTURE OF VARIOUS TYPES OF SAMPLE 


3.1 : Definition of frame and sampling unit 

III this chapter we propose to give a technical description of the structure 
of the -various types of sample which are most commonly employed in practice, 
and the methods which must be followed in selecting them. The methods of 
obtaining- estimates of the population values and of the sampling errors from 
the sample Values will be discussed in Chapters 6 and 7. 

* All rigorous sampling demands a subdivision of the material to be sampled 
into Units, termed samp ling, units, which form the basis of the actual sampling 
procedure. These units may be natural units of the material, such as individuals 
in a human population, or natural aggregates of such units, such as households, 
or they may be artificial units, such as rectangular areas on a map, bearing no 
relation to the natural subdivisions of the material. 

It is not always necessary to make an actual subdivision of the whole of 
the material before selection of the sample, provided the selected units can be 
clearly and unambiguously defined. Thus, with sampling units which are 
rectangular areas on a map there is no need to demarcate all these areas ; they 
can be defined by co-ordinates, and the selected areas demarcated after selection. 

Clear and unambiguous definition demands the existence or construction 
of some form of frame. In the sampling of a human population, for instance, 
with households as sampling units, there must be available a list of all house¬ 
holds, and this list must be such that any household selected from it can be 
unambiguously located. In area sampling from maps, the maps must be such 
that the selected areas can be unambiguously defined on the ground. 

The specification of the frame implicitly defines the geographical scope of the 
survey and the categories of material covered. A survey of a human population 
based on a list of households, for instance, will only cover those categories 
of the population which constitute the households included in the list. If 
other categories require inclusion, or if the frame is defective, special steps 
will have to be taken to supplement and emend it. 

In statistical terminology any aggregate of values is termed a population , 
and consequently the whole aggregate of sampling units into which the material 
is divided is known as the p opulation o f sampling units. If the sampling units 
are aggregates of the natural units of the material, these natural units will form 
a further population which must be distinguished from the population of 
sampling units. 

In America the term cluster sampling has been applied to sampling in which 
•the sampling units are aggregates or 44 clusters ” of the natural units. The 
term is a somewhat loose one, since there is often a hierarchy of natural units, 
e.g . a sample in which the sampling units are households may be regarded as 
an ordinary sample of households or as a cluster sample of individuals. 
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In multi-stage sampling there is also a hierarchy of sampling units, first- 
stage, second-stage, etc., corresponding to the different sampling stages, and 
each set of units will form its own population of units. 

Sampling units may be of the same or differing size. They may contain 
the same, or approximately the same, number of natural units, or they may 
contain widely differing numbers. The whole procedure of sampling, including 
the estimation of the population values and the sampling errors, is simplest 
when the sampling units are of approximately the same size and contain 
approximately the same number of natural units. Often, however, the material 
is such that this condition cannot be conveniently fulfilled. In particular, if 
the natural units are themselves of widely differing size, variation in size of 
the_ sampling units or in the number of natural units they contain is inevitable. 

# There is nothing in the sampling process which demands that the sampling 
units, should be of any particular size, but, as has, been explained in Section 2.9, 
the smaller the sampling units employed the more accurate will be the, 
results obtained when a given proportion of the material is included in the 
sample. . 

3.2 Random sample , 

A random sample is the simplest type of rigorously selected sample, ^nd 
is the basis of most of the more complicated sampling methods. In a. random 
sample, after subdivision of the material into sampling units, the requisite 
number of units are selected at random from the whole population of units. 

As has been emphasized in Section 2.3, random selection implies a strict 
process of selection equivalent to that of drawing lots. Ip* practice it may 
be carried out either by some such process, or preferably, since adequate 
shuffling of cards, etc., is difficult, by the use of a table of random numbers. 
A small table of random numbers is given at the end of the book. The examples 
of this section illustrate the use of such a table. 

The process of random selection may proceed in two stages. Suppose 
that the population is divided into groups of units containing x ly x %y ^ 3 , * . v % 
units. The successive sub-totals .., 

X 1 ~ X 1 T* x 2 — X 2 , X 1 + X 2 + X 3 = X 3 , . . . X x ~\~'X 2 -j- • • . + Xri = X n 
are first calculated, which is easily done on a printing adding machine. The 
requisite number of numbers are then selected at random'between I and X n> 
numbers that occur more than once being rejected. A selected number that 
is greater than X s -i> but less than or equal to X s , indicates that a unit of the 
sth group is to. be taken. Selection of a unit at random from this group, which 
can if convenient be made on the basis of the number already selected,; will 
then give the equivalent of completely random selection.. 

This two-stage process is of value when the full numbering or demarcation 
of the units in all groups before sampling is laborious, since only the total 
number of units in each groups need be known. It is of particular value , when 
the units are artificially demarcated areas, and the total areas of natural 
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subdivisions of the material are known. By using the process only the units 
in the selected groups have to be numbered or demarcated. 

Example 3.2.a 

Select a sample of 20 from a population of 2879 units. 

Using the four-figure numbers given by the first four columns of digits 
in Table A. 1, and rejecting all numbers greater than 2879, we obtain the 
sample 347, 1676, 1256, 1622, 1818, 2662, 2342, 1608, 2742, 39, 1690, 1127, 
1490, 2046, 526, 797, 2699, 1465, 2467, 1753. 

The above procedure results in the rejection, in this example, of nearly 
three-quarters of the random numbers given by the table. Various devices 
may be used to avoid this. In the present example the simplest is to take 
the numbers 3001-6000 and 6001-9000 as equivalent to 0001-3000, rejecting 
the numbers 9001-9999 and 0000. Using the second column of four-figurd 
numbers gives the sample 1373, 2467, 227, 2599, 2635, 1794, 1753, 378, 1234, 
2632, 792, 897, 1064, 2819, 1712, 1837, 2722, 1504, 13, 2565. 

If with either of the above procedures the same mnit is selected a second 
time, the number leading to this selection is rejected, and an additional 
number taken. 

It will be noted that neither of these samples is evenly distributed over 
the whole range of units. The distribution between the different thirds of 
the range is in fact: 


Numbers 

1st sample 

2nd sample 

1-960 . 

4 

5 

961-1920 

10 

8 

1921-2879 

6 

7 


20 

20 


Random selection will give samples that deviate somewhat from an even 
distribution, the actual deviations being themselves governed by statistical 
laws. Exact statistical tests show that about three out of four samples will have 
smaller aggregate deviations than the first sample, but only three out of ten 
will have smaller aggregate deviations than the second sample.* 

Example 3.2.b 

Select unit areas mile X To mile at random from a rectangular area 
5 miles X 4 miles. 

There are 2000 unit areas, which can best be defined by co-ordinates 1-50 
along the longer side of the rectangle, and 1-40 along the shorter side, the 

* The appropriate test is that known as the X 2 test. A description of this test 
will be found in most modem statistical textbooks. 
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co-ordinates selected defining the corner of the unit area furthest from the 
corner of the rectangle (0, 0). The selection of a number at random between 
1 and 50, and a second between 1 and 40, will therefore select a unit at random. 
Taking the third column of four-figure numbers (beginning 8636) and following 
the second of the procedures of Example 3.2. a gives the pairs of co-ordinates 
36, 36 ; 12, 02 ; 16, 16 ; 14, 38 ; etc. 

If points instead of unit areas are to be selected each co-ordinate range 
should theoretically be infinitely subdivided. The actual degree of subdivision 
need not usually be very fine. 

The procedure of this example may be used for the selection of unit areas 
or points from an irregularly shaped area, provided the extreme range of 
each co-ordinate is included, points falling outside the area being rejected. 
More elaborate processes, involving less rejection, can of course be devised, 
but care must be taken that the probability of selection of all areas or points 
is equal # Thus in a triangular area, the selection of lines parallel to the base 
at random distances from the base, followed by the selection at random of 
a point within the triangle on each of the selected lines, will give a greater 
density of points near the apex of the triangle. The selection of points within 
a circle by the selection of random distances and bearings from the centre 
will give a greater density of points near the centre. In irregularly shaped 
areas, also, fractional unit areas requiring special treatment will occur at the 
boundaries. 

Example 3,2.c 

14 streets in a ward contain 25, 17, 5, 59, 64, 22, 38, 16, 21, 12, 14, 38, 
17, 23 houses respectively. Make a random selection of 6 houses from all 
371 houses. 

The successive sub-totals are 25, 42, 47, 106, 170, 192, 230, 246, 267, 279, 
293, 331, 348, 371. A table of random numbers gives the numbers 72, 128, 
96, 326, 199, 202. The units 72 and 96 therefore fall in the 4th street, the 
unit 128 in the 5th street, the units 199 and 202 in the 7th street, and the unit 
326 in the 12th street. Since 72 — 47 = 25, and 96 — 47 = 49, the 25th. 
and 49th houses in the 4th street are selected, etc. The numbering of four 
streets, involving 199 houses, is required. 

3.3 Stratification with uniform sampling fraction 

In a stratified sample the population of sample units is subdivided into 
groups or “ strata ” before selection of the sample. These strata may all 
contain the same number of units, or differing numbers of units. If a uniform 
sampling fraction is used, the same fraction of the units of each stratum is 
included in the sample, the units selected being chosen at random from all 
the units within each stratum. A stratified sample is thus equivalent to a set 
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of random samples on a number of sub-populations, each equivalent to one 
stratum. 

Stratification has two purposes. The first is to i ncre ase the accuracy of 
the overall population estimates. The second is to ensure ffiaFSubdivisions 
of the population which are themselves of interest are adequately represented. 
Such subdivisions may be termed domains of study ^ Maximum overall accuracy 
will be attained if the strata are so chosen that the units within each stratum 
are as similar as possible. It will often be advisable to use domains of study 
as strata, however, even if some other form of stratification might be expected 
to give somewhat more accurate results. If there is marked heterogeneity 
within some or all of the domains of study these may themselves be subdivided 
into smaller strata for the purposes of sampling. 

Stratification affects the estimation of the sampling error. Sinc e in a 
stratified s ample only variation within strata gives rise to sampling error, it 
is this com ponent of variatio n t hat requires estimation, and this can in general 
only be done from differences between units in the same stratum. It is 
therefore necessary, if an estimate of sampling error is required, that the strata 
be of such size that the sample contains two or more units from at least the 
majority of strata. In certain cases, in which the use of strata containing only 
a single selected unit appears advisable on account of the gain in accuracy 
thereby obtained, special methods, often of an approximate nature, of estimating 
the sampling error have to be adopted. The matter is discussed further in 
Section 8.15. 

If the sampling units are already classified in the required strata, the 
selection of a stratified sample can be made in the same way as a random sample, 
the requisite number of units being selected at random from each stratum. 
If, however, the population is not so classified, selection by this method Would 
necessitate prior classification. In this case, if the numbers of units ill the 
different strata are known, an alternative procedure is available. This consists 
of selecting a sample at random ; keeping a tally, as the selection proceeds, 
of the numbers falling in each stratum ; and rejecting any further members 
of a particular stratum as soon as the requisite number for that stratum has 
been obtained. On the other hand, if the numbers of units in the different 
strata are not known, a count covering the whole population will in any case 
have to be made, in which case a classification which will serve as a basis for 
the subsequent selection of the sample may well be carried out simultaneously. 

Unless all the strata contain the same number of units it will usually happen 
that the chosen sampling fraction will not give an exact whole number qf units 
in each stratum. In this case the nearest whole number of units has to be 
taken. We may thus differentiate between the yjQtking sampling fraction^ 
Which with stratification with a uniform sampling fraction is the same fqr all 
strata, and the exact sam pling fractions.^ which will differ slightly from the 
working sampling fraction. The use of the working sampling fraction in the 
analysis of the results leads to minor inaccuracies, but these will seldom give 
rise to errors of any practical importance. v 
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It may be noted that if the numbers of units from the whole population 
falling in different strata are known, and a random sample is taken which is 
sufficiently large to ensure that adequate numbers of units are obtained from 
all strata, adjustment of the results so that the different strata are represented 
in their correct proportions will lead to practically the same accuracy as would 
be obtained with a stratified sample. Which of these two alternative courses 
is adopted in any particular case is a question of convenience. If the selection 
of either type of sample is equally simple it is best to use a stratified sample, 
as the computations are thereby simplified. In certain "“cases, however, the 
classification of units into strata may only be possible by means of information 
obtained in the course of the survey, in which case a random sample, with 
subsequent adjustment, is required. Thus, for example, in a survey of a human 
population, the age distribution of the whole population may be known, but 
prior selection of individuals of particular ages may be impossible owing to 
lack of information on these ages, 

3.4 Multiple stratification 

A population may be stratified for two or more dif feren t character igtic s. 
If selection Is made from sub-strata formecT of the various combinations of the 
main classifications the procedure is exactly equivalent to ordinary stratification, 
thd sub-strata being equivalent to strata. Thus we may stratify farms 
according to size and according to geographical regions. If the farms in each 
region are classified into size-groups before taking the sample then the region— 
size-group combinations form the individual sub-strata. 

'Occasionally the number of units of the population falling in each set of 
main strata may be known, e.g. from prior census data, but not the numbers in 
the various sub-strata. Thus, in the above example there may be information 
on the numbers of farms in the different size-groups, and also on the numbers 
in the different geographical regions, but not on the numbers of each sizfe-group 
in each region. In such cases we may attempt the selection of a sample which 
will have the right proportions for each set of main strata. Such stratification 
may be termed multiple stratification without control of sub-strata. The selection 
of such a sample, however, presents both theoretical and practical difficulties, 
and the calculation of the sampling error is also troublesome. 

In the rare cases in which multiple stratification without control of sub-strata 
is deemed to be necessary a simple procedure of selection which should give 
a reasonably satisfactory sample is as follows. Units are selected at random 
until the total of every row and column of the two or more-way table for the 
sets of strata is at least equal to the required total. The excesses of these 
marginal totals are calculated, and numbers chosen for deduction from the 
sub-strata totals which together make up these excesses, and which, subject 
to these restrictions, are about proportional to the sub-strata totals. (A method 
of calculating such numbers is shown in the following example.) The 
corresponding numbers of units are then rejected from the sub-strata groups, 
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those rejected being selected at random. If the original selection was strictly 
random the condition of randomness will be fulfilled if those last selected 
are rejected. 


Example 3.4 

A sample of 1000 is required from a population classified into two sets of 
four strata, the sub-strata totals being unknown, but the correct strata totals 
for the sample being known to be 120, 280, 350, 250, for each set of strata. 

After a sample of 1125 units had been drawn, the numbers of units in the 
16 sub-strata shown in Table 3.4.a were obtained. 


Table 3.4.a—T wo-way stratification without control of 

SUB-STRATA : INITIAL SAMPLE 


Strata B 

Strata A 

Total 

Required 

Excess 

1 

2 

3 

4 

1 

37 

40 

35 

8 

120 

120 

0 

2 

39 

140 

82 

56 

317 

280 

+ 37 

3 

45 

97 

173 

93 

408 

350 

' + 58 

4 

8 

40 

86 

146 

280 

250 

+ 30 

Total 

129 

317 

376 

303 

1125 

1000 

125 

Required 

120 

280 

350 

250 

1000 

— 

— 

Excess . 

+ 9 

+ 37 

+ 26 

+ 53 

125 

■ 



Table 3.4.b— Calculation of numbers of units to be rejected 



Stage 1 

Stage 2 


At 

A t 


A. 

Total 

A 1 

As 

^3 

A 4 

Total 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


5 

16 

9 

7 

37 

- 1 

+ 2 

- 4 

+ 3 

0 

B , 

6 

14 

25 

13 

58 

- 2 

+ 1 

- 9 

+ 5 

5 

B t 

1 

4 

9 

16 

30 

0 

0 

- 4 

+ 9 

+ 5 

Total 

12 

34 

43 

36 

125 

- 3 

+ 3 

- 17 

+ !7 ; 

0 

Required 

9 

37 

26 

53 , 

125 




Difference 

+ 3 

- 3 

+ 17 - 

17 

0 







26 


STRUCTURE OF VARIOUS TYPES OF SAMPLE 


sIect. 3.4 


Table 3.4.b —Continued 



Stage 3 


A 2 

A 3 

A t 

Total 

B t 

0 

0 

0 

0 

0 

B, 

0 

0 

0 

0 

0 

B s 

+ 1 (0) 

+ 1 

2 

+ 1 (+ 2) 

+ 5 

B& 

0 

-1 

- X (- 2) 

-3 (-2) 

- 5 

Total 

+ 1 (0) 

0 

+ 1 (0) 

-2 (0) 

0 


Table 3.4.c —Numbers of units rejected, numbers in final sample and 

CORRECT SUB-STRATA TOTALS 



Numbers of units rejected 

Numbers in final sample 


A 

A z 

A 3 

A t 

Total 


A 2 

A 3 

A, 

Total 

-B. 

0 

0 

0 

0 

0 

37 

40 

35 

8 

120 

B t 

4 

18 

5 

10 

37 

35 

122 

77 

46 

280 

B« 

4 

16 

18 

20 

58 

41 

81 

155 

73 

350 

B t 

1 

3 

3 

23 

30 

7 

37 

83 

123 

250 

Total 

9 

37 

26 

53 

125 

120 

280 

350 

250 

1000 




Correct 

sub-strata totals 


At 

^2 

^3 

A 4 

Total 

B 1 

40 

40 

30 

10 

120 

B % 

40 

120 

80 

40 

280 

B a 

30 

80 

160 

80 

350 

B* 

10 

40 

80 

120 

250 

Total 

120 

280 

350 

250 

1000 


The three stages of the calculation are shown in Table 3.4.b. In stage 1 
the excesses of the rows have been distributed in proportion to the numbers 
of units in the sub-strata of each row. The distributed excesses are added by 
columns and compared with the required excesses. The differences, with 
signs reversed, are distributed by columns in stage 2 , excluding the first row, 
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and the process is repeated for rows in stage 3. The numbers are now small 
and empirical adjustments, shown in brackets, have been chosen to make 
the column totals zero. 

The three stages are then summed and deducted from the numbers in the 
original sample, as shown in Table 3.4.c. This table also shows the correct 
proportional sub-strata totals of the population from which the sample was 
drawn. The appropriate statistical test* shows that a Isatisfactory sample has 
been obtained. 

The above process does not necessarily converge, but will usually do so 
in practical cases. If a negative value for number of units rejected is obtained 
for any sub-stratum this can be brought to zero by an empirical adjustment. 

3.5 Stratification with a variable sampling fraction 

In certain types of material very considerable gains in accuracy will result 
if different sampling fractions are used for the different strata. The greatest 
accuracy for a given number of units will be attaine d if t he sampling fractions 
are ^rd porti onal to the within-strata stan dard devia tionsf of the units. If the 
sampling fractions are denoted by f lt / 2 > • • • and the standard deviations by 
c 1? cr 2 , ... we have 

fi = h = 

C 1 a 2 

In some cases this formula may give sampling fractions greater than unity for 
some of the strata. If this occurs the whole of these strata are included in the 
sample. , . rfF < . 

A particularly important application of the variable sampling fraction is 
to material stratified into size-groups. In such material the various quantitative 
characteristics of the units under investigation often have within-strata standard 
deviations which are roughly proportional to the mean sizes of the uni^s in the 
different size-groups. In this case the sampling fractions should be taken about 
proportional to these mean sizes. If quantitative characteristics very highly 
correlated with size of unit are under investigation, the ranges of the size- 
groups may give good estimates of the relative within-strata standard deviations. 
The sampling fractions may then be taken proportional to these ranges. 
Changes with time, however, are usually by no means so highly correlated with 
size, and when the changes are of interest, sampling fractions proportional to 
the mean sizes of the size-groups will usually be best. 

The above rules will determine sampling fractions which give the maximum 
accuracy for estimates of the population values. In cases in which the values 
for th t individual strata are of interest, i.e. cases in which the strata themselves 
form domains of study, it is also important to see that all the strata are adequately 

• * = 9*1, 9 degrees of freedom. . 

*|* The meaning of this term” and the method of estimation will be explained" in 
Chapter 7. 
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represented in the sample, and for this reason the ndg.of strict proportionality 
t^ej^juide^atisns. or Jo jhemean sizes of th_e,size-groups, oftenrgquire 

some modification. . . . rtl , r * 

“When several quantities are under investigation, it will usually be found 

that their within-strata standard deviations are not in quite the same proportions. 
This, however, is not a very serious problem in practice, since JffP Mg 
fractions which are somewhere near the optimal wiU give results which are 
i^M^c55ET3r^''^iii'^y the optimal fractions. Consequently 
there is usually no great difficulty in choosing suitable sampling fractions 
which will reconcile the various conflicting requirements. . 

Since, for these reasons, sampling fractions are often used which are not 
optimal, we have preferred not to adopt the term “ optimal allocation, which 
has sometimes been used to denote stratified sampling with a variable sampling 

fraction. within _ gtrata standard deviations can only be estimated from data 
relating to the material to be sampled, or from data derived from similar 
material, but general knowledge of the behaviour of material of a particular 
type, e.g. material stratified into size-groups, will often enable suitable sampling 
fractions to be chosen with all necessary accuracy. It is sometimes suggeste 
that a preliminary survey should be undertaken merely to determine the optimal 
sampling fractions, but this is rarely worth while, though if a preliminary survey 
is being undertaken for other purposes it will of course also serve to improve 
the sampling fractions. 

3.6 Systematic samples from lists 

Although the importance of the principle of random selection in sampling 
has been stressed, much practical sampling is in fact not fully random in 
character. Thus a frequent method of selecting a sample, when a list of the 
units of the population to be sampled is available, is to take every qth en ry 
on this list. This may be termed a systematic sample from the list. Othe 
more complicated systematic procedures may occasionally be adopted tor 

speciali purPtomary, ^ salutary> t0 dete rmine the first entry by selecting a 
number at random between 1 and q, but this element of randomness does not 
convert the sample into a random one* A systematic sample would be 
equivalent to a fully random sample if the list were arranged wholly at random. 
No lists, however, are arranged at random. The nearest approach to random 
order is probably provided by alphabetical lists, though even these have certain 
non-random characteristics : in this country, for instance, a large proportion 
of the Scotsmen will be found under the letter M. If every qt entry ls a ei b 
a kind of partial stratification will therefore be obtained, and the sample 
will be somewhat more precise than a fully random sample. Thus m a 

* Except in the trivial sense that the sample is a random sample of 1 unit: out of 
q unite ea P ch unit being composed of the aggregate of a set of all entries at spacing q. 
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systematic sample of farms taken from a list of farms arranged by parishes 
the Proportion of farms drawn from each parish will be moref less Constant! 

STparish 6 SamP mg mterVal is SmaI1 c o m P a red with the number of farms 

to the lack of definition of the strata it is impossible to msk P a fully 
val d estimate of the sampling error, but provided there are no periodic 

i l? ” mp !r r“ "°* be biased - A” 

error which is good enough for practical purposes can usually be made by 
regarding the sample as a sample stratified in the major subdivisions of the list 
If ™17 a ™2°l7 d Roupings. H the Sampling error is esS 

,W V k P 6 fu ! y random an overestimate will be obtained, the 
entr£1?',te U |?„ 8I '“ ,ir m “ e * he of the “ighbouri^ 

5 enera1 ’ systematic sampling from lists will be found to be quite 

in die C lis7which de<1 ^ ™ T S6e that there are no P eri odic features 
n the list which are associated with the sampling interval. The method is 

sfnf tlflf m ° re f COn , Vement than fand0m or stratified random sampling, 
since the labour of makmg a proper random selection, which in an extensive 

samplmg scheme is often very considerable, is avoided. It must be deX 

recognized, however, that the responsibility for the judgment that the material 

investigator. SyStematlC sam P Iln E wl11 give satisfactory results rests with the 

distfnfiribed Whkh t f Selectio , n is wholly systematic should be clearly 
distinguished from sampling in which there is proper random selection of 

sampling units which are themselves systematic aggregates of smaller sub¬ 
divisions of the material. Thus a common method offampling rows of potatoes. 

zssvst of samp,in8 au * he 


material ° f alternative wa y s of sampling highly variable 

discule^in ^ Vali ° US meth ° ds ° f sam P lin S which have been 

the nrowf f fl ^ We WiU COnsider their application to 

the problem of determining the area under wheat in an English countv 

utilized PUrP0SC 3CreageS ° f Hertfordshire ^s for 1939 wem 

The choice of problem and of basic material were dictated partlv bv 

variab^materkl ^ ^ ^ ^ ed *° T E com P lete set of dat a covering highly 

variable material, since an investigation of different sampling methods can be 

carried out most easily when data are available for the whole of the material 

t should be emphasized that this example is to be regarded as illustrative 

nly. Sampling methods are very unlikely to be required for the determination 
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of crqp acreages in a country such as England, since all farmers make returns 
of their acreages each year ; the only possible use of sampling would be at 
the compilation stage, where it might be used to avoid the necessity of totalling 
the whole of the returns. The present investigation is not fully relevant to 
this problem, however, since the acreage of only one of the most extensively 
grown crops is considered, and that for only one county. Considerably greater 
errors, proportionately, may be expected in the less common crops. 

Records were available for 2496 farms, and the acreage of wheat, and also 
the total acreage of crops and grass, which is virtually the total acreage of 
farmed land and will be termed the size of the farm, were abstracted for each 
farm. The original records were arranged by districts, by parishes alphabetically 
within districts, and by farmer’s name alphabetically within parishes. This 
order was preserved in the abstract. The return for any farm, or “ holding,” 
does not necessarily relate solely to land in a given parish, but may include 
land in other parishes farmed by the same farmer j farmers with two or more 
distinct farms may make separate returns for these farms or may include them 
all in a single return. The total area of wheat in the county, from the abstracted 
returns, was 44,676 acres, and the total area of crops and grass was 273,074 
acres.* 

If farms are taken as sampling units the dominant source of variation in 
wheat acreage will be variation in size of farm, since farms range from 1 acre to 
over 1000 acres, and no farm can have more than a fraction of its area under 
wheat. Stratification by size of farm is therefore indicated. The use of a 
variable sampling fraction will also be advantageous, since the wheat acreages 
of the large farms will be much more variable than those of the smaller farms. 
Further stratification by districts is possible, but is not likely to give much 
increase in precision unless the incidence of wheat growing in the different 
districts is very markedly different. In any case a systematic method of 
selection from the list, which in view of the alphabetical method of arrangement 
will be quite satisfactory, will give the effect of stratification by districts. 

For comparative purposes the following samples were taken : 

(1) a random sample of 1 in 20 farms, 125 farms in all ; 

(2) a stratified random sample with a uniform sampling fraction of 1 in 20 ; 

(3) a stratified, systematic sample with a variable sampling fraction, the 

fraction being approximately proportional to mean size of farm within 
each size-group, and chosen so as to give about the same number of 

farms, actually 135, as samples 1 and 2 ; the systematic method of 

selection within size-groups results in approximate stratification by 
districts also. 


cv ^ ma y ^hat these values disagree with the values shown in Agricultural 

Statistics, viz. 46,281 acres and 278,380 acres respectively. The reasons for this 
discrepancy need not concern us here, but it provides an illustration of the fact 
that disagreement between sample and complete returns must not be assumed to 
be necessarily solely due to sampling error. 
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The size-groups chosen, the number of farms in each group, and the 
sampling fractions and numbers of farms for sample 3 are shown in Table 3.7. a. 
For sample 2 the last two size-groups were combined. 

Table 3 . 7 . a—H ertfordshire farms, 1939: size-groups, 

NUMBERS OF FARMS, AND CHOSEN VALUES OF THE VARIABLE 
SAMPLING FRACTION 


Size-group 
(acres crops 
and grass) 

No. of 
farms in 
county 

Sampling 

fraction 

No. of 
farms in 
sample 

1-5 

435 

Nil 

0 

6-20 

519 

1/200 

3 

21-50 

357 

1/60 

6 

51-150 

519 

1/20 

26 

151-300 

400 

1/10 

40 

301-500 

215 

1/S 

43 

501- 

. 51 

1/3 

17 


2,496 


135 


As pointed out in Section 3.3, the random sample can be stratified after 
selection so as to eliminate the effect of variation between size-groups, provided 
the number in each size-group of the population is known. Variation due to 
size may also be eliminated by using size as supplementary information, provided 
information on size of farm is available for each farm in the sample, and the 
total area of all farms in the county is known. Either the ratio or the regression 
method may be used, as explained in Chapter 6. 

The estimates of the wheat acreage and of the number of farms growing 
wheat obtained from these samples, their estimated sampling standard errors, 
and in the case of the acreage the actual errors of the estimates, are shown 
in Table 3.7.b. The sampling standard errors, as will be explained in 
detail in Chapter 7, give measures of the average magnitudes of the sampling 
errors that may be expected with given methods of sampling and of estimation. 
The computations leading to these estimates are set out in Chapters 6 and 7, 
where tables giving the actual values of the sample units for the three samples 
(Tables 6.6.a, 6.5.a and 6.7.a respectively) will also be found. 

It is apparent from the values of the sampling standard errors for wheat 
acreage that, as is to be expected, both stratification and the use of a variable 
sampling fraction have resulted in large gains in accuracy* The use of 
supplementary information on size, whether by stratification after sampling, 
by ratio or by regression, serves to make the random sample about as accurate 

* The word accuracy is used in this book to denote the expected accuracy of an 
estimate, as indicated by its sampling standard error. It is occasionally also used to 
denote the actual accuracy, as indicated by the actual error (usually unknown), but this 
should not cause any confusion. 
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as the stratified sample with a fixed sampling fraction. This is also to be 

* XP The d numbers of units, i.e. farms, required to attain the same accur fjY 
with different methods of sampling and estimation may be taken ‘ J 
proportional to the squares of the standard errors, allowance being made 
for the greater number of units included in sample 3. These are shown in 
the column “ relative variance,” the random sample with direct estimation 

Table 3.7.b—H ertfordshire farms: comparison of various types of 

SAMPLES AND METHODS OF ESTIMATION 


Method of 
selection — 
and 

estimation p 


Wheat acreage 


No. of farms growing wheat 


Estm ia tJ Standard | ActUal variance Estimate St “ dard ) variance 
Estimate error error I {arm error per farm 


Random, 

direct 

46,020 

± 7,950 

+ 1,340 

100 

Random, 

stratified 

after 

selection 

41,100 

± 4,320 

- 3,580 

30 

Random, 
by ratio 

41,570 

± 3,940 

- 3,110 

25 

Random, 

by 

regression 

j 40,400 

dr 4,130 

- 4,280 

27 

Stratified, 

direct 

40,220 

i 4,110 

- 4,460 

27 

Variable 

42,765 

± 2,550 

- 1,915 

10 


900 ± 104*6 .100 
860 ± 7 5*2 52 

Not calculated 
Not calculated 

1,080 ± 71*6 47 

911 ± 88*9 .72 


sampling 

fraction, 

direct 


being set at 100. Thus stratification by size, or elimination of variation due to 
size in the estimation process, reduces the number of farms required by a 
factor of about 4, and the variable sampling fraction results in a further 

reduction by a factor of about 21. . , 

The situation with respect to number of farms growing wheat is somewhat 
different. Stratification has again resulted in considerable increase in accuracy, 
though the gain is not so great as with acreage.- The sample with a variable 
sampling fraction, on the other hand, is not so accurate*! the ordinary stratified 
sample. The sampling fractions which are optimal for the determination o. 
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wheat acreage are by no means optimal for the determination of the number 
of farms growing wheat. numDer 

standard aCtUaI errors of u the estimates bear little relation to the sampling 

standard error's H 7 n ° CaSe markedly lar S er than' dies! 

standard errors. The random sample without adjustment gives the most 

standard ®f lmate ,° f acrea S e > thou gh this estimate has the ifrgest sampling 

Sir fho!!h’it a ! H JU T em ° f , the rand0m Sample makes th " actual !rrof 

of the fact that an dUCeS ^ Samp i m f standard error - This is an illustration 
of the fact that an inaccurate method of sampling will sometimes by chance 

be iudie? 1 bv1h e e Stlmate; J He aCCUracy of a sam P Iin g Procedure must never 
judged by the magnitude of a single discrepancy; a large discreoancv 
provides some evidence that a method is inaLJe, but fsinSTS 
discrepancy provides practically no evidence that it is accurate. § 

3.8 Multi-stage sampling 

Of kn multl ‘ stage sam pbng the material is regarded as made up of a number 

t2 e “ mp Th g u "“ s v each of wtich is made “p ° f ■ 

tage units etc. The sampling process is carried out in stages. At the first 

r2?do hC fi r~, S fl!i e Umt r afe sam P led b y some suitable method, such as. 

nhti°lt r i ,f C S3mP !' ^ thC SeC ° nd Stage a Sample of second-stage 
units is selected from each of the selected first-stage units again by 

some suitable method, which may be the same as ordifferent torn the 
required. emP ° yeC ^ first ' stage units ‘ Father stages may be added as 

Multi-stage sampling introduces a flexibility into sampling which is lacking 
of tho slm P ler methods ; It enables existing natural divisions and subdivision! 

in SecZ T9 t° ^ 3t the stages > a * d > aa Pointed ou! 

2 ' 9 ’ , pc the concentra tion of the field work of censuses and' 
surveys covering large areas. On the other hand, for the reasons there given 
a multi-stage sample is m general less accurate than is a sample containing 
the same number of final-stage units which have been selected by some- 
suitable single-stage process. y 

• , Multl 'T ge sam P J ing also has the important advantage that subdivision 
ipto second-stage units, i.e. the construction of the second-stage frame need 

t°he y samnTe rrie T d t “‘i” ?“* f units which ^ actually included in 

the sample. It is therefore particularly valuable in surveys of undeveloped 

sThdiv W « f°, e CXi f tS Which is efficiently detailed and accurate for 
subdivision of the material into reasonably small sampling units. 

for anv C t ! ^ T* y ° f muIti - sta g e sampling which are possible 

y given type of material, careful investigation is often required before a 

reached* & This m^^inf 5 iS beSt f ° r any P articular P ur P°se can be 

ched. This matter will be discussed in detail in Chapter 8, after the methods 

of evaluating sampling errors have been described 
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3.9 Sampling with probabilities proportional to size of unit 

If we have areas demarcated on a map, such as fields, and a point is located 
at random on the map, the probabilities of the point falling within the boundaries 
of the different fields are clearly proportional to the areas of the fields* 
Consequently areas can be selected at random with probabilities proportional 
to their size by the simple procedure of taking random points on the rnap. 
It will be noted that such a process of selection may result in the same area 
being included twice or more in the sample. In this case it must be counted 
twice or more. We cannot, without distorting the probabilities, make a further 
selection in the manner followed with equal probabilities. 

The principle has applications in agricultural surveys designed to determine 
the acreage and yield of different agricultural crops, total cultivated area, etc. 
All that is required for acreage is to determine the proportion of points which 
fall in areas of the given type. The method is therefore particularly attractive 
when carrying out surveys of the areas of crops, etc., by aerial survey, provided 
the different crops can be recognized on the photographs, since it avoids all 
the measurements of area which would be required if . an ordinary random 
sample of areas were taken. The sampling of the fields with probabilities 
proportional to size is in this case equivalent to the sampling of small unit 
areas of equal size whose locations are determined by the random points. 
When only areas require to be determined the sizes of the fields in which the 
random points fall are in fact immaterial. 

The analogy with the case of a stratified sample with a variable sampling 
fraction indicates that under certain circumstances greater precision may be 
expected from areas selected with probabilities proportional to size than will 
be obtained if they are selected with equal probabilities. 

In the case of yield determinations, when the total acreage is known, the 
determinations of the yield from a sample of fields selected with probability 
proportional to size may always be expected to give a more accurate estimate 
of the mean yield per acre and total yield than will similar yield determinations 
on a random sample of fields irrespective of size. If the total acreage is not 
known then the situation is more complicated, but here again sampling with 
probabilities proportional to size is often advantageous. 

Sampling with probabilities proportional to size of unit, or to some other 
known quantitative character of the units, may be carried out on other types of 
material by forming a cumulative or running total of the sizes of the units, and 
selecting numbers at random from the total of all the units in the manner of 
Example 3.2.c. Stratification by size and the use of a variable sampling 
fraction will usually be preferable in such cases, however, on the grounds both 
of accuracy and convenience, except in the special circumstances to be described 
in the next section. 
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3.10 Sampling from within strata with probabilities proportional to 
size of unit 

Apart from area sampling, sampling with probabilities proportional to size 
of unit is mainly of use when the units are stratified according to some other 
characteristic, and the number of units to be selected from each stratum, or 
from some of them, is small. In this case an ordinary stratified sample will 
give either inaccurate or biased estimates when the ratio method of estimation, 
explained in Chapter 6, is used. The bias or inaccuracy is removed by selecting 
the units from within strata with probabilities proportional to size. This 
fact appears to have been first recognized by Hansen and Hurwitz (1943, A). 

This procedure is particularly useful in conjunction with two-stage sampling 
with large first-stage units of variable size. The first-stage units are selected 
from within strata with probabilities proportional to size, and the second-stage 
units are selected with probabilities inversely proportional to the sizes of the 
first-stage units. By this device the overall sampling fraction is kept constant, 
with consequent simplification of the computations. 

As before, when more than one unit is required from a stratum, selection 
with probability strictly proportional to size can only be simply effected if a 
unit which happens to be selected twice is counted twice. Generally, however, 
when each stratum contains only a few large units, duplication of units is not 
desirable ; instead a further unit is selected, with slight inexactitude in the 
probability of selection, and consequent slight, but usually negligible, bias 
in the results. 


3.11 An example of sampling by administrative areas 

Reverting to the illustrative example of Hertfordshire farms considered in 
Section 3.7, we may now investigate the effect of taking parishes as sampling 
units, or as first-stage units in two-stage sampling with farms as second-stage 
units. 

The use of parishes as sampling units may be expected to result in a sample 
which is somewhat less accurate than a sample containing the same number 
of farms distributed over all parishes. Nevertheless, if actual visits have to 
be made to the farms it may pay to use parishes as sampling units, or as first- 
stage units in two-stage sampling. In analogous situations in undeveloped 
countries, where definition of farm boundaries may present difficulties, the 
complete survey of small administrative or other areas may also be better than 
any attempt to sample individual farms. 

Inspection of the Hertfordshire data showed that the total farm area (crops 
and grass) included in the returns of farmers of a single parish was very 
variable, partly owing to variations in size of the parishes, and partly because 
some of the parishes were mainly urban in character. It is therefore best to 
use individual parishes as sampling units only if they contain a certain minimum 
acreage of crops and grass. The minimum chosen was 2000 acres, the remaining 
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parishes being grouped roughly in the order in which they appeared in the 
alphabetical list, to form “ combined ” parishes containing over the minimum 
acreage. The effect of this combination is shown in Table 3.11.a. 


Table 3.11. a — Numbers of farms, parishes, and “ combined ” parishes 
in the districts of Hertfordshire 


District 

No. 

District 

..1 

No. of 
farms j 

1 

.":v,: 

; 

No. of 
parishes 

No. of 

parishes after 
combination 

■ ■. 

1 

Barnet . 

230 

17 

7 

2 

Bishop’s Stortford . 

316 

23 

16 

3 

East Herts. . 

564 1 

31 

! 20 

4 

Hitchin 

553 I 

36 

! 25 

5 

St. Albans 

218 

9 

!. 7 

6 

Tring . 

424 1 

16 

i H 

7 

Watford 

191 

8 

| 5 ' 


* 

2,496 ■ 

! . .A- 140 

1 - 

; 91 1 


Districts were used as strata in this sampling, 1 in 5 “ combined ” parishes 
being taken per district, i,e. 17 parishes in all. 

Two samples were taken. In sample A the parishes were selected in; the 
ordinary manner, with equal probability of selection for each parish. In 
sample B selection with probabilities proportional to size was employed. The 
parishes of sample B were also sub-sampled in two ways, samples B 1 and B%+ 
In sample B x a uniform sampling fraction of \ was taken for sampling at the 
second stage, with stratification by size, using the size-groups of Table 3,7. a 
with the last two size-groups combined. In sample jB 2 a variable sampling 
fraction was used with values for size-group 1-50 acres, J for fl-150 acres, 
i for 151-300 acres, and 1 for over 300 acres. Sample B is given in detail 
in Example 6.17. 

The relative efficiency of the Various methods is discussed in Section 8.9. 
The results are summarized in Table 3.11. b. This table is similar to 
Table 3.7.b, except that estimated average values of the standard errors are 
given, and not those calculated from the actual selected samples. These latter 
are not sufficiently accurate for comparison owing to the small number of 
parishes involved. 

It will be seen that a sample of 1 in 5 parishes provides results which are 
decidedly more accurate than a stratified random sample of 1 in 20 farms 
with a uniform sampling fraction, but somewhat less accurate than a similar 
sample with a variable sampling fraction. The stratified random sample of 
1 in 20 farms is 1-29 times as accurate as sample B 1} allowing for differing 
numbers of farms. The similar sample with the variable sampling fraction is 
1*83 times as accurate as sample jB 2 . Sample B is somewhat more accurate 
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than sample A, particularly if the unbiased estimate given by the overall ratio 
is used for sample A. The difference is not marked, however, since the com¬ 
bination of parishes has created units which' do not differ excessively in size. 

Table 3.11.b— Hertfordshire farms: samples for wheat acreage with 

“ COMBINED ” PARISHES AS SAMPLING UNITS OR FIRST-STAGE UNITS IN 
TWO-STAGE SAMPLING 



; No. 
of 

Method of Sampling 







Sample 




Method 

of 


Expected 

standard 

Actual 

Relath 




Estimate 


stages 

1st stage 

2nd stage 

estimation 

error 

error 

variant 






r Overall 

41,730 

4- 3,080 

- 2,950 

100 






ratio 




A 

1 

Stratified by 

— 

* 








district 



District 
^ ratios 

41,010 

± 3,010 

- 3,670 

95 

B 

1 

Stratified by 

— 


District 

46,660 

± 2,870 

| 

+ 1,980 

87 



district, 
probability 
proportional 
to size 



ratios 

• ■; i 

! 









! 

Bi 

2 _ 


Stratified 


District 

48,930 

± 4,950 

+ 4,250 

259 




random 


ratios 

i 

b 2 

2 


Variable 


District 

1 

45,600 i 

± 3,460 

+ 920 

127 




sampling 


ratios 






fraction 



1 





3.12 Multi-phase sampling 

It is sometimes convenient and economical to collect certain items of 
information from the whole of the units of a sample and other items of 
information from some only of these units, these latter units being so chosen 
as to constitute a sub-sample of the units of the original sample. This may 
be termed two-phase sampling. Further phases may be added if required. 

Multi-phase sampling is of value in several ways. Its simplest application 
is to the case in which the number of units needed to give the required accuracy 
on different items is widely different, either owing to the fact that the variability 
of the associated variates is different, or because the accuracy required is 
different. If no use is made of the relations between the different variates, 
such multi-phase sampling is equivalent to taking samples of different sizes 
for the different items. 

First-phase information may also be used as supplementary information in 
order to improve the accuracy of second-phase information, by the same 
methods, ratio and regression, that are applicable where supplementary 
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information on the whole population is available. Thus, in a crop estimation 
survey based on farms as sampling units, a relatively large sample of farms 
may be taken for the determination of the acreage of the crop, and the yields 
may be determined on a sub-sample only of these farms. 

If the first-phase information is collected prior to the second-phase 
information the first-phase information may be used as a basis for the sub¬ 
sampling process, e.g. by stratification of the first-phase units for the selection 
of the second-phase sample, with or without the use of a variable sampling 
fraction at the second phase. 

It will be noted that in both these latter applications of two-phase sampling 
the methods followed are the same as those adopted in ordinary single-phase 
sampling, the population being replaced by the first-phase sample ; but since 
the first-phase information is not known for the whole population it is itself 
subject to sampling error, and this must be taken into account when estimating 
the sampling errors of the estimates of the second-phase variates. 

Multi-phase sampling differs structurally from multi-stage sampling in 
that in the former the same sampling units are used throughout, whereas in 
the latter a hierarchy of sampling units is used. Multi-phase sampling may be 
combined with multi-stage sampling. In a scheme for the estimation of the 
acreages and yields of agricultural crops, for example, a two-stage sample of 
farms and parishes may be taken for the estimation of acreages, and a sub¬ 
sample of these farms may be taken for the estimation of yields. 

3.13 Balanced samples 

If the average value of some quantitative character of the units, such as 
size, is known for the whole population, it is possible, provided the sizes of 
the individual sampling units are known, to s elect a sample in such^a ^manner 
that th e average size of th e selected units is equal to the average size of all 
foej units of the population. Such a sample will only be satisfactory if it is 
otherwise equivalent to a random^ sample, in which case it may be termed 
a balanced sample. 

Balance may be employed in conjunction with stratification for some other 
character. In this case balance may be effected either for the whole population, 
or for each of the strata separately. The latter course should only be adopted 
if the number of units selected from each stratum is moderately large : otherwise 
undue restrictions will be placed on the sample which will result in the selection 
of a sample which is not otherwise equivalent to a random sample. On the 
other hand, when the strata are balanced separately more accurate estimates 
of the separate strata means and totals will be obtained, and the accuracy 
of the estimates of the overall population means and totals may also be 
somewhat improved. 

Balancing for a known quantitative character provides an alternative to 
stratification by size-groups in this character. Balancing, however, will only 
be effective if the differences in the quantity or quantities under investigation 
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are approximately proportional to differences in the known character, whereas- 
stratification by size-groups will take account of any type of relationship. As 
will be seen in- Chapter 7, the estimation of sampling errors is simpler in the 
case of a stratified sample. 

The increased accuracy resulting from balancing can equally well be 
obtained, at the expense of some additional computational labour, by adjusting 
the results of an unbalanced sample by the use of regression in the manner 
explained in Chapter 6. Since the additional labour of adjustment is nearly 
proportional to the number of variates under investigation, the advantages of 
balancing as opposed to regression increase as the number of variates increases. 

Balancing can also be carried out for a character which is inherently 
qualitative, but which for the sampling units actually employed acts as a 
quantitative variate because the sampling units are themselves aggregates of 
smaller natural units. Thus if a sample of a human population composed of 
two different races is being taken, the sampling units being administrative 
areas containing numbers of individuals, the sample can be balanced for the 
percentages of individuals in the two races. If the individuals were the sampling 
units then balance would be equivalent to stratification by races. 

A balanced sample is best selected by using a process of replacement. 
In the first place a random or stratified random sample of the required size is 
selected, record being kept of the order of selection. The average value of 
the known quantitative character is then calculated for the sample. This will, 
in general, not be equal to the average for the population, indicating lack of 
balance. A further sampling unit is then drawn, and compared with the first 
unit of the original sample. If balance is improved by substitution of the 
new unit, this is done, otherwise the original unit is retained. The process 
is then repeated for the second and following units of the original sample 
until an adequate degree of balance is attained. 

The selection of a sample balanced for more than one character presents 
more difficult problems, and will not be discussed here. 

Balance, in the cruder form known as purposive selection , was at one time 
extensively used in sample censuses and surveys. No rigorous rules of 
selection were followed, however, with the result that many purposively selected 
samples were by no means equivalent to balanced random samples. Thus it 
frequently happened that the selection was confined to sampling units having 
values of the known quantitative character near the average. Clearly in such 
samples the variability of the known quantitative character, and of any other 
characters closely correlated with it, will be considerably less than the true 
variability in the population. The sample may also be unrepresentative in 
other ways. 

Purposive selection was often used in an attempt to avoid the necessity, 
which was otherwise apparent, of employing reasonably small sampling units. 
Thus Gini and Galvani (1929, A) selected a sample from the Italian Census Data 
of 1921 which consisted of all the returns of 29 out of the 214 circondari into 
which the country is divided, using seven control characters. Agreement of 
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lnd a thaT 8 of V i e f ° f ° ther ? ara u ters in the SampIe and population was poor, 
Tht I f ? frequency-distributions of such characters was even worse 

Waller u^ite a Ae S us r f “ ^ ° f , ex< : essivel y lar g e units, though even with 

smaller units the use of purposive selection without rigorous rules of selection 

rf jis*rllm ty “ sa,is6e,0ryThere »■ «. «o ~ 

a _ T° r th !f e reason ®. Purposive selection has ceased to be extensively used 
and m modern sampling work it has largely been replaced by more thorough 

is pakUrthe nm T ° f st f ifi oation, etc. Provided proper attention 

s paid to the process of selection, however, there is no fundamental objection 

to balanced samples. These have a certain limited usefulness in some types 
of census and survey work, though it must be recognized that the need for The 
subdivision of the population into an adequate number of sampling units is 
in no way obviated by balancing for one or more quantitative Characters. 

3.14 Systematic samples from areas 

A common method of sampling material continuously distributed either 

Ihe maLTal tlm Th S Units distribute d at equal intervals over 

the material The chief application in census and survey work is in the 

locTed n lvC n ' areaS - When -T apS are available sampling units can be 
located by superimposing a grid of points, frequently of square, or nearly 

square, pattern. Such a sample may be termed a systematic area sample * 

• t f ; Sy t m , atl , C area sampl ® dlffers from a systematic sample from lists mainly 
m the spatial distribution of the sampling units over the material. Most lists 
do not correspond at all exactly, except for major groupings, to any physical 
distribution, and a systematic sample from a list therefore usually approximates 
much more closely to a random sample than does a systematic !L sample! 

S the e tL m clse°s 6Stlmatmg the Sampling error are therefore appropriate 

wil/hTTT 1 ’ Pr ° Vided therC T n ° Peri ° dic features - a nematic area sample 
nTCV r m ° re 3CCUrate * an 3 Stratified random sam P Ie (^th one unit 
are situaTd sprata con srstin^ of rectangular blocks (or cells) whose centres 
re situated at the systematic sampling points. In material in which the 
t ariation ,s of a continuous nature it is impossible to make any accurate estimate 

r/ TT SamP g er ™ r ,. wlt , hout taking supplementary sampling points, though 
if there are no periodic features an upper limit can be obtained. g 

if the regions near the boundary are likely to differ from the remainder 

cliTnT*’ aS m T bC Ae C3Se lf the boundar y is a natural one, such as a sea 
coast or a mountain range, it will be best, after locating the sampling grid at 

andom, to demarcate the bounding lines of the cells, and sample at random 
the area which is not covered by complete cells, dividing it into equal or 

each ofXsTa eqUa r S „T d l0C3ting ° nC Sampling P oint at random within 

equal L area tTth 1 T u be conv , enient ’ if possible, to make these cell areas 
equal m area to those of the sampling grid, since equal weight will then be 
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given to all sampling points. The same method of dealing with boundaries 
can be used if the sampling is random within rectangular cells. . 

Systematic sampling is entirely unsuited to material which has periodic 
features but apart from this will generally provide a satisfactory method of 
area sampling ? It has the advantage over stratified random sampling from 
blocks that the location of the sampling units is simpler and ^he resuhs obtained 
nrovide rather better material for the construction of maps, etc. As in 
systematic sampling from lists, however, the responsibility s ^ sf ^ to ^ d f“ 
that the material is such that systematic sampling wil giv y 

vf-cts with the investigator. 


3.15 Line sampling 

In the sampling of areas certain types of information can 
almost as easily for all the points on a line as they can for a set of isolated 
points or areas! In such cases sets of parallel lines or strips may be taken 
as the sampling units. In stratified random line sampling, the area is divide 
into rectangular blocks of convenient length and of such a width that two 
selected sampling lines are included in each block, their location within the 
block being random. If an exact estimate of the sampling error is not required 
Sy onThne need be included in each block, with «n««poo^y 
blocks. In systematic line sampling the sample is made up of lines at equal 

int erval provides an alternative method to point sampling (Section 3.9) 

for determining fhe proportions of a given area which are of differen types 
These proportions are given by the ratios of the aggregate of the intercepts 
of the different types. In area determinations of this type, whether by lme ot. 
point sampling, systematic sampling can usually be adopted. The method 
is useful both for area determinations on the ground in undeveloped counry 
and for the determination of areas from maps and aerial photograp s. ^ 
method has been much used for forest surveys, where it is known as cruis g. 

If, instead of obtaining information for all points on each of e l , 
sample of such points is taken, the sampling becomes two-stage. If the 
andfdie points on them are both evenly spaced the sampling is equivalent 

(JpiSg ofTSewha, different type is also used in order to determine 
the acreage of agricultural crops in areas which are well provided with roads. 
A route if chosen which covers the whole area as adequately as possible, “d 
the lengths bordered by the different crops are measured. A car fitted wrth 
a special milometer can be used for this purpose. Estimates ofheyydne 
harvest time can also fee obtained in a similar manner, by stopping the car 
at every xth mile of a given crop, and cutting and harvesting a small area of 
the crop, the area being selected in some systematic manner, such as entering 
the crop a given number of paces at right angles to the roa . ^ 

. Line sampling of this type does not provide an unbiased sample, since 
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roads are by no means randomly located with regard to agricultural crops, 
lhe results from surveys following the same route in successive years may well 
be comparable, however, and with calibration by more exact methods from 
time to time, road cruising may provide a satisfactory method of making rapid 
and inexpensive surveys. Similar methods based on tracks are possible in 
areas with only a sparse road network. 

3.16 The principle of the moving observer 

If counts are required of a collection of individuals who are moving about 
the ordinary methods of sampling can only be applied with difficulty. Thus’ 
to determine the number of people in a crowded street by ordinary methods 
would require the demarcation of a number of small areas in the street and 
the counting of the number of people on each of these areas. The counts 
need not necessarily be simultaneous, but for any one area the number of 
people present at a given moment has to be counted. Unless photographic 
mrthods are available, or the areas are very small, such counts are extremely 
difficult, since individuals are continually moving into and out of the areas 
and are also moving about within them. 

Equally it is no use stationing observers at fixed points with instructions 
to count passers-by. The number of people in a street will depend not only 
on the numbers passing fixed points but also on the velocity of movement up 
and down the street. If all exits and entrances to the street are covered, and 
there are no people in the street at the start of the counts, the number present 
at any subsequent time can be determined from counts that are continuous 
and without error. In practice, however, errors in counting usually result in 
cumulative errors which invalidate the results. Thus it was found impracticable 
to determine the numbers in a department store by posting people at the doors 
to make counts. 

These difficulties can be overcome by using moving instead of stationary 
observers. To obtain an estimate of the number of people in a street the 
observer traverses the street in one direction, counting all the people he passes 
m whichever direction they are moving, and deducting all the people who 
overtake him. He then re-traverses the street in the opposite direction, moving 
at the same speed and counting as before. If this is done the average of the 
two counts gives an estimate of the average number of people in the street 
uring the time of the counts. If people are mostly moving in one direction 
the count in this direction will be reduced, but the count in the opposite direction 
will be correspondingly increased. In practice the deductions required for 
those overtaking the observer can be kept small by moving at a speed greater 
than that of the majority of the crowd. 

This method was used to estimate the numbers of people in streets, shops 
etc., at different times of the day, in order that the adequacy of the provisions 
for public air raid shelters might be tested. It was found that very dense crowds 
in streets and shops could be estimated with surprising ease. Crowded streets 
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were dealt with by teams of two or more observers, moving down the street 
in a transverse line, with each observer counting the people between him and 
the next observer. In the large stores the floor was divided into areas which 
were assigned to the different observers. 

The method is of general application. It can be used, for example, to assess ^ 
populations of insects or animals in a state of movement, provided all individuals 
can be readily seen, and provided the passage of the observer does not itself 
influence the movement of the individuals. 

3.17 Interpenetrating samples 

It is often advantageous to take two or more independent samples of a 
given population, using the same sampling procedure for each sample. Such 
samples are called interpenetrating samples. 

Interpenetrating samples are of value if the survey or census has to be 
carried out by successive stages. This is frequently necessary when preliminary 
results are required quickly. Thus in the 1942 Census of Woodlands of 
England and Wales, described in Section 4.25, it was necessary to obtain a 
preliminary estimate of the timber content with very limited field staff within 
six months of the initiation of the survey. The work was therefore planned in 
two interpenetrating samples. Before the field work was commenced, each 
unit of the first sample was subdivided into two, since it became apparent that 
the whole of the first sample could not be completed in the allotted time. 
This further subdivision itself created two interpenetrating samples of a special 
type. By means of this procedure it was possible to provide a preliminary 
estimate of the total timber content of the whole country by the time it was 

required. . 

An incidental advantage of interpenetrating samples is that separate and 
independent estimates of the characteristics of the population are furnished. 
The agreement of such estimates is often more convincing to the layman than 
any statement of the sampling error. 

Interpenetrating samples have a further use in that the different samples 
can be assigned to different investigators. Comparison of the results provides 
a check of the investigators against one another. 

To perform the functions outlined above, each of the interpenetrating 
samples must itself provide an adequate sample of the material and must be 
comparable with the other samples—in other words the samples must be 
really interpenetrating. If this is not the case the comparisons between the 
different samples will be subject to relatively large errors. If, for instance, 
they are used to test differences between different investigators, the information 
obtained will be of insufficient accuracy to be of any real use. Equally, if one 
sample is used to provide a preliminary estimate, this estimate may well not 
attain the required degree of precision. Thus in an agricultural survey stratified 
by areas such as counties, the division of the counties into two groups, with 
each of the samples confined to one group only, would not be likely to give 
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a satisfactory pair of interpenetrating samples. The separate samples would 
be subject to variation between counties, and would therefore be considerably 
less accurate than the combination of the two, from which variation between 
counties is entirely eliminated. The proper use of interpenetrating samples 
therefore necessitates increased expenditure on travelling. 

3.18 Sampling on successive occasions 

The types of sampling so far discussed are appropriate to a census or 
survey carried out on a single occasion, with the object of determining the 
characteristics of the surveyed population at or about a given point in time. 
If the population is subject to change, a survey carried out on a single occasion, 
however accurate, cannot of itself give any information on the nature or rate 
of such change. In certain types of population extraneous sources of 
information, such as registrations of births and deaths, may be relied on to 
provide information on the changes which the population is undergoing. Even 
in such cases the census must be repeated at intervals, both because of 
inaccuracies in the extraneous information, which may lead to a gradual 
accumulation of errors, and also because the information is rarely of such a 
nature that all aspects of the original census or survey can be kept up-to-date. 
Registration of births and deaths, for example, coupled with figures for 
immigration and emigration, will furnish data for the revision of the total of 
the population but will not enable changes in the population of separate towns 
and districts to be determined. 

In many cases no such extraneous information on the changes that are taking 
place is available, and in such cases provision must be made for periodical 
re-survey if up-to-date information is required. A number of alternatives 
then present themselves :—• 

(1) A complete census or survey may be repeated in its original form at 
intervals. 

(2) A sample census or survey may be repeated at intervals, a new sample 
being selected on each occasion without regard to previous samples. 

(3) A sample census or survey may be repeated on the same sample. 

(4) Part of the sample may be replaced on each occasion, the remainder 
being retained. If there are a number of occasions a definite scheme 
of replacement may be followed, e.g. one-third of the sample may be 
replaced, each selected unit being retained (except for the first two 
occasions) for three occasions. 

(5) A re-survey of a sub-sample of the original sample may be made. In 
the case of a complete census this is equivalent to a re-survey of a sample 
of the whole population. 

The following terms are suggested for the last four alternatives: 
(2) independent samples ; (3) fixed sample ; (4) partial replacement ; 
(5) sub-sample. It will be noted that independent samples are formally 
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equivalent to interpenetrating samples, a fixed sample is formally equivalent 
to the observation of different characters (variates) on the same sample, and a 
sub-sample is formally equivalent to a two-phase sample. Only partial 
replacement has no formal equivalent in the types of sampling already described. 
These equivalences are of importance in that the methods of estimation will 
be the same for formally equivalent sampling processes. 

The relative advantages of the various types of procedure depend on the 
relation between the variability of the units and the variability of changes in 
these units as well as on the relative importance of information on the 
population means and on the changes in these means. If, for instance, the units 
are very variable but the changes of all units are similar, accurate information 
on change can most easily be obtained by re-survey of a fixed sample of units ; 
provided always that proper provision is made for new entrants to the population, 
and for the elimination of the disturbance which results from the extinction 
of selected units. If, on the other hand, information on the population means 
is of paramount importance, partial replacement or a sub-sample will usually 
be preferable. A more detailed discussion, in terms of the errors to which the 
various estimates are subject, is given in Section 8.8. 

There are two further points which must be borne in mind in connection 
with sampling on successive occasions. Firstly, repeated re-survey of the 
same units may be inexpedient, since resistance to the provision of the necessary 
information may be engendered, and secondly, repeated re-survey may result 
in modification of these units relative to the rest of the population. This can 
arise in many ways. In a survey of agricultural practice, for instance, visits 
to farms may result in the farmers concerned improving their practice through 
advice from the investigators : advice which, if asked for, can scarcely be refused. 
A more subtle example is provided by the 1942 Census of Woodlands. In this 
census it was considered that if the subsequent fellings and replantings were 
recorded on the sample areas then surveyed an adequate measure of the changes 
in woodland throughout the country might be obtained over some considerable 
period of time. It has since been suggested that the amount of felling on 
the sample areas may have been affected by the fact that survey information 
was available on these areas and. not on others, with the result that these areas 
have been more intensively exploited. 

3.19 Composite sampling schemes 

Simplicity and uniformity of sampling procedure is obviously in general 
desirable, but there are occasions on which different methods of sampling are 
required for different parts of the population. In sampling a human population, 
for instance, some form of area sampling may be most suitable for the rural 
parts of the country, whereas some form of stratified random or systematic 
sampling based on lists of houses may be best in the towns. There is, of 
; course, no objection to the use of such composite schemes, provided each part 
fulfils the requirements of good sampling procedure already laid down. 
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adequate knowledge of the statistical properties of the material, pilot surveys 
are frequently advisable in large-scale surveys in order to test and improve 
held procedure and schedules, and to train field workers 

Questions arising under heads (1), (2), (3), (4) and (7) of the above list 
are common both to complete censuses and surveys and to sample censuses 
and surveys. Even here, however, the problems encountered differ considerably 
in the two cases, owing to the greater scope for the collection of detailed 
information and the execution of complicated observations by the sampling 
method. r 6 

The determination of the items on which information is to be collected, 
the degree of detail to be attempted, and the ways in which the information 
can best be obtained, often constitute the most difficult and crucial part of the 
planning of a survey. No amount of care in the planning of the sampling 
or skill in the analysis will compensate for failure in this respect. A survey 
m which the information collected does not adequately cover the field to be 
investigated at the best provides a partial and incomplete picture, and at 
the worst may be irrelevant or actively misleading. 

Careful consideration must therefore be given at the outset to the purposes 
for which the survey is to be undertaken, the type of information it is proposed 
to collect, and the uses to which the information obtained will be put. In 
the case of large-scale surveys, which are likely to provide information that 
will be of value to a number of different organizations or government depart¬ 
ments, a detailed statement on these points should be prepared. In this way 
those who are likely to want to make use of the results of the survey will be fully 
apprised of its nature, and can if necessary make suggestions for modifications 
before the survey is begun. 

The statistician who will ultimately be responsible for the analysis and 
presentation of the results should, if possible, be selected and appointed at 
the planning stage. Similarly if the advice of a statistical expert is to be sought, 
this should be done, in the first instance, at the planning stage. This rule 
applies even in the simplest types of census. It frequently happens that such 
censuses are undertaken without any prior consultation with a statistical expert 
whose advice is only sought when the results have been collected and the 
stage of analysis is reached. 


4.2 Definition of the population 

The categories, or types of material, which require to be included in a 
survey, and its geographical scope, are conditioned in broad outline by the 
purposes of the survey, and by administrative and research requirements 
related to these purposes. Within this broad outline, however, there is often 
a certain amount of latitude, and c areful consideration sho uld therefore be 
Sil£Ei9J.beJnclusion.Qr omission of marginal categories, particularly those on 
which the collection of information is likely to be specially difficult, orfor which 
is Jacking. By excluding unimportant marginal categories 
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the task of collecting the information may often be very materially simplified, 
without seriously reducing the value of the results. 

A census of the human population residing in a given territory, for example, 
should ideally include all individuals present in that territory at a particular 
moment, and in simple censuses an attempt is usually made to attain t is 
end It is often, however, difficult to obtain information on certain minor 
categories, such as nomads. These difficulties occur even in a complete census 
but are often more marked in the case of a sample census. The question of 
whether such categories may be omitted entirely without serious loss should 
therefore be considered. 

The matter becomes of even greater importance when a human population 
census requiring the collection of detailed and complicated information is 
undertaken, using skilled investigators making visits to individual members 
of the population. In such cases visits to members of the population with a 
permanent residence, even if they are absent from their residence at certain 
times, are relatively simple, but it is far more difficult to cover the floating 
elements of the population. The conduct of such a census becomes very much 
simpler, therefore, if these latter elements can be omitted. 

In a similar manner, in the case of an agricultural census, the determination 
of the areas of the various crops might ideally require that all areas of the crops 
grown within the boundaries of the territory should be included. It may, 
however, be possible to exclude small areas, such as those found in gardens 
and holdings of very small size, without seriously reducing the value of the 
information. The agricultural censuses of England and Wales, for example 
which are based on returns from farmers, exclude all agricultural holdings of 
less than one acre, and do not attempt to take account of crops grown in 

private gardens or allotments. . . , , 

The question of whether or not minor categories should be included 
depends mainly on the purposes for which the information is required A 
case is sometimes made for the inclusion of certain categories on which the 
information is intrinsically of little interest in order to ensure comparability 
with the results of previous censuses or surveys, or with the results of parallel 
surveys in other countries. Comparability within and between statistical 
series is obviously desirable, and lack.of it can seriously reduce the value of 
the results, and also increase the labour of statistical analyses and the danger 
that those unfamiliar with the details of the various sources of information 
may draw wrong conclusions. Nevertheless when introducing a radica y new 
method of collecting information, such as replacement of a complete census 
by the sampling method, excessive weight should not be given to past practice. 
It should not be forgotten that so-called complete censuses are often m 
themselves subject to errors of various kinds, including lack of completeness, 
and that such errors are often a greater source of disturbance to comparability 
than the omission or inclusion of a few minor categories. If there is any serious 
doubt whether a given category should or should not be included this may be 
regarded as prima facie evidence that the category in question differs m 
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This basic problem is essentially the same in complete censuses and sample 
ce„Il»d »rv„s, but the problem is m„,e 

censuses and surveys, since the items of information that can be couecrea 
Tnd the observations that can be made are themselves more complex and 

V3r The best way of arriving at a satisfactory solution of thw basic problem 
■ q lls „a11v as follows. In the first instance, the details of the information 
required \o deal with the problems originally propounded are determine . 
The que^fon is then considered whether there are any related^ P-Wems of 

importance on which this ZflSZZLy items of 

infmmatlon required fofthi full elucidation of these additional problems should 

exnedient to place on the investigators and respondents in a single surv y 
? The details of this process vary greatly in different types of survey, 
the ^eneral^rhiciple tofollow in i types of survey is to see|that the iems 
of information collected form a rounded whole covering a definite subject 

'^This prTn U c^e f i S s U of'particular importance in surveys of questionnaire 
type on human populations, whether the 

respondents themselves, or the information is elicited b y^ ldn vet,gat^ 
Accurate information can only be obtained in such surveys « ^ 

CO oneration of those providing the information is obtained. The survey 
therefore have a clear purpose which can be explained to the res P^^ 

deluded, rP or "if the questions 

Zgfgh »s rjp£ 

The matter is of importance even m enquiries which q 
of factual information by observation and measurement b y theinve ®^ a 
themselves without any co-operation from respondents. If the held 
tZ tiglSs are not imbued with a sense of the importance of their enquiry 

^ ^Occasionally in cases in which a questionnaire would otherwise be unduly 
Ions it may be’possible to split it into parts, obtaining information on one 
Iroup oHtems from one se/of respondents, and on another group of items 
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S^n 8 T n ? S6 \, C ,f rtain baSic kemS of infor mation will be required from 
The JarnDW^r 1 tW ° T Wi “ ^ 3 pair ° f interpenetrating samples 
firt nL P f h a SO r 1;WO ' phase structure, the basic information acting as 

-r oth S T- ? this procedure is foIlowed > S; 

the relationships between items of information in the two groups can only 

other spiabk s-« xs 

pXtxxx, ‘ofoZfzz' «4,:t r hi ensure ,he 

Xf asked whether they prefer coa,, sas or XX 

for their preferences, it is essential to ascertain in some detafl what expSenc! 
they have had of methods of cooking other than the one they are now using 
mcluding the type of apparatus used. If this is not done the answers may be 
more an indication of the effectiveness of an advertising camSTLTJoS 

rather 6 th J ’ ” a condem nation of antiquated pieces of apparatus 

rather than any reflection of the true relative merits of the different methods’ 

nformation is also often required cn items which, though not of primary 

of the rlufeto^rfnc H 3 K y ^ thcreby e ™ hle the Precision 

Tn r K I b , lncreased b y the appropriate methods of estimation 

outline ^ ng ? C deC1S10ns on the ^ of information required, both in broad 

experts on 4 “ abs ® lutely essential to w °rk in collaboration with 

experts on the subjects which it is proposed to cover. If research or 

administrative experience in the subjects to be covered is lacking itls fetallv 

easy when designing a survey to omit some vital items of informal A simole 

instance of such omisswn is provided by the 1921 and 1931 Population Censuses 

of the United Kingdom. In these censuses information on age of mother at 

19U ?v’ 3nd t0td number of chlldren born, which had been Obtained in the 
J911 Census, was not asked, with the result that- ttw* r • r • 

provided by these censuses for studies on changes of fertility'of .u” ° rn ? at ! on 

s ix s “°" si ? red "“ d - as . ...uhTXS'Xlisr/r” 

T * • ^ ecessar y t0 institute a special Family Census in 3 946 (Section 4 19) 

e X dX ‘XXX' ,he X f » < h “ “X y 

Of cXlXX,"® ” ,ght m “ St Clearly h " C sspoa 

tl ip 1 nl a add f lti0n n t0 direCt collab °ration with experts in the various subjects 
plans for the survey should be circulated at all stages of development to 

L T h TT and ,r dividuak Wb0 are bkefy to be InSed fo 

he results. This will usually result in requests for the collection of 

supplementary items of information, some of which may not be “ces^v for 

the msuhsm be^sed T ^ ° riginaIly pIanned but which wiI! ena ble 

r i °f °^ er P ur poses. In this way the usefulness of the 
vey may often be considerably increased. On the other hand the danger of 

X°L7uXd u S»T, i,h 7 X c,i “ of r cel '“ eous iK ™° f 

reviewed ^ Sh ° uld therrfore >* ™ry earefuliy 
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SECT. 4.4 

4.4 Inter-relations of groups of natural units 

If the physical inter-relations between the members of groups of natural 
units oTthe material under survey are of interest, or if information is required 
for groups of natura l units as a whole, then information must be collected 
for S groups as a whole, or at least for pairs of units from such group.. 
Thus if the inter-relations between the different members of a household 
require to be studied, it is essential to have information for pairs of individuals 
belonging to the same household, and it is usually best that the informa on 
should cover the whole of a household. This can br ensured by using 

households or dwellings as sampling units. . 

Another type of natural aggregate for which it is often important to obtain 

results as a whde is that provided by towns, villages, etc., and, ln , agn ^ 
surveys homogeneous geographical areas. This often calls for the adoption 
of multi-stage sampling, the natural aggregates forming the sampling units 
at the first stage, even in cases where the use of single-stage sampling 
otherwise preferable. Thus in a survey of a human population it -ay 
of considerable interest to contrast the results for individual towns offoff• g 
types and to study the inter-relations existing within a single town eve 
Jhen there is no need for all the towns of the country to be covered. 

Similarly if inter-relations between the behaviour of the same individuals 
or other natural units at different times are of interest, the survey must be 
designed *so as to provide information covering an adequate period of time. 
Thus in an investigation into hours of sleep of children, it is of little value to 
Itermfoe the anufunt of sleep of a sample of children each for a single day 
Such data will throw no light on the question of whether children who have 
a short period of sleep on a particular day tend also to go short of sleep on 
other days or are able to make up for this short period by longer P enods 
preceding or following days. In the same way, studies of nutrition in which 
foe intake of food is determined for each individual for a single day only 
although they will show whether a group as a whole is under-nourished, are 
incapable of revealing the degree of variation m under-nourishment betwee 
individual and individual, since individuals going short of food on a P artic ^ 
day -ay make up for such deficiencies, in whole or m part, on succeeding 

^We have stressed this point at some length because there has been a tendency, 
in surveys on human populations of the questionnaire type, to take the 
easv course of asking those interviewed about occurrences which are stil fresh 
in their mind, e.g. what happened on the previous day. This course is f ° 1Io ' ved 
for various reasons. It may be considered that information provided about 
earlier occurrences will be inaccurate, or that there is a danger of ^burd^mg 
respondents if an attempt is made to cover too long a period , or the object 
may be to save interviewers the trouble of repeated visits which might be 
required to cover a period of time accurately. Actually it has been found that 
the use of a very short period does not necessarily lead to accurate average 
results 6 ; in certain circumstances there may be a tendency on the part of 
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respondents, either consciously or unconsciously, to telescope events, and 
report them as happening in the given period when in fact they happened 
earlier. Thus a survey of crockery breakages made by asking what breakages 
occurred over the past week led to an entirely excessive estimate of the amount 
of breakage, whereas a similar survey asking for breakages over the past year 
gave results which checked well with production figures and the domestic 
stocks (Box and Thomas, 1944, D, discussion). 

4.5 Practicability of obtaining the required information 

So far we have been considering the problem of determining what items 
of information are required in order that the purposes of the survey may be 
fulfilled. Each item must, however, be considered in the light of the 
practicability of obtaining it. If the information is to be furnished in response 
to questions, the points for consideration are whether the respondents are 
sufficiently informed to be capable of giving accurate answers ; whether, if 
the provision of accurate answers involves them in a good deal of work, such 
as consulting previous records, they will be prepared to undertake this work ; 
whether they have motives for concealing the truth, and if so whether they 
will merely refuse to answer, or will give incorrect replies. If the information 
is to be obtained by observation or physical measurement, the points for 
consideration are whether the observations are such that they are within the 
competence of the investigators or other individuals who will be required to 
undertake them ; whether they will make excessive demands on the. time of 
the investigators or others, or require excessively expensive apparatus ; and 
whether the owners of the surveyed material will permit the observations to 
be made. 

Considerations of this kind will inevitably lead to modifications of what 
would otherwise be considered an ideal scheme. Nor can general answers be 
given, even within the limits of a particular field of enquiry. In countries 
such as the United Kingdom, for example, there is no reason to suppose that 
any large amount of inaccuracy is introduced into the returns of the population 
censuses by deliberate mis-statements. In countries not accustomed to 
population censuses fear that the information will be used for such purposes 
as taxation or conscription may lead to considerable inaccuracies. Similarly 
in crop-sampling work the use of small sample areas may be quite satisfactory 
with certain classes of field worker, but, as is shown by Table 2.5, is entirely 
unsatisfactory in other cases. 

When the ideal requirements cannot be fully met it is sometimes possible 
to include other items of information, observations, or physical measurements, 
which, owing to their high correlation with the quantities which it is desired 
to determine, will serve as more or less adequate substitutes for these quantities. 
These substitute measures may be used for purposes of stratification or 
classification of the data in the final analysis, as for example when the rateable 
-value of a dwelling is used as a substitute for the income of the household 
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occupying it ; or they may be substitutes for measures of quantities which 
themselves require assessment, as for example the use of eye estimates in place 
of direct measurements of the yields of a standing crop. The efficiency of such 
substitute measures can only be properly judged by a proper statistical 
investigation of the relations between them and the quantities for which they 
are substitutes. In the case of substitutes for measures of quantities that are 
to be assessed, some method of calibration is essential if objective estimates 
of the original quantities are to be obtained. (The calibration of eye estimates 
is discussed in Sections 6.15 and 7.14.) 

It will inevitably happen in certain cases that information which is of 
considerable importance will prove to be unobtainable, or unobtainable with 
sufficient accuracy. When such a situation arises it must be squarely faced. 
There is at times a tendency to attempt to collect information which, because 
of its nature, cannot be obtained with the necessary accuracy, and then 
to condemn the survey method in general because the results are of little 
value. 

This, however, does not mean that the collection of difficult items of 
information should not be attempted. The sample survey procedure, because 
it makes possible the use of skilled investigators working on a relatively small 
sample, is frequently capable of eliciting reliable information on points which 
it would be quite impossible to include in a general enquiry. The fact that 
the enquiry is on a small sample, if known to the respondents, frequently 
makes them willing to give information which they would certainly not be 
prepared to give if the enquiry were general. In such cases it is important 
that the investigators should themselves be recognized as impartial and 
disinterested ; in particular they should not be officials of an organization 
which itself might make use of the information obtained to the detriment 
of the respondents. 

Nevertheless there are subjects on which it is impossible to collect accurate 
information from a random sample of the population. In certain of these 
cases information can be collected from a selected group of individuals, 
e.g. individuals with whom social welfare workers are in contact. Information 
of this type is not necessarily valueless, but it must be clearly recognized that 
it is not the equivalent of information obtained from a random sample of the 
whole population, and any attempted generalization of the results will be of 
limited validity. 

Attempts are sometimes made to obtain a sample from such a group of 
individuals which conforms more closely in certain respects to the population, 
e.g. in classification by age or social class, than does the group as a whole. 
While this may improve the sample somewhat, it still does not provide the 
equivalent of a random sample. On the other hand, if the whole of the group 
is not required, it is usually advisable to apply some rigorous form of selection 
rather than to permit the workers themselves to select individuals for 
investigation, as the latter procedure will merely introduce further unnecessary 
elements of bias. 
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In cases in which some of the items of information are difficult to collect, 
multi-phase sampling may be of value. It may, for instance, enable specially 
skilled investigators to be used, for the more difficult items. Thus in a health 
survey medically qualified investigators may be used on a small sub-sample 
of a much larger sample on which more general items of information relating 
to health have been collected. Equally it may be used to reduce the work 
required to manageable proportions. Thus, in the Survey of Fertilizer Practice 
soil samples for chemical analysis were taken from one old-arable field., one 
new-arable field, and one field of permanent grass on each farm, these fields 
being a sub-sample of all the fields on which information on the use of 
fertilizers was obtained (Section 4.23). 

4.6 Methods of collecting the information 

The methods of collecting the information are to a large extent conditioned 
by the material under survey and the type of information required. Where 
the alternative possibilities exist, it may be stated as a general rule that 
observations are preferable to questions, and questions on facts and on past 
actions are preferable to questions on generalities and on hypothetical future 
conduct. Thus it is better to inspect a house to see if it shows signs of damp, 
than to ask the occupant if the house is damp ; and it is better to find out 
what considerations, from among the various alternatives (if any) that presented 
themselves, governed the selection of the house in which the occupant is living, 
rather than to ask what type of dwelling—house, flat, bungalow, etc. is 

“ preferred.” * 

On the other hand, it is scarcely possible to state any general rule with 
regard to physical measurements and qualitative observations made by the 
investigator. Physical measurements are more objective, but qualitative 
observations are often more capable of summing up the salient features of a 
complex situation. Thus a qualitative grading by the investigator of the degree 
of dampness of a house is likely to be more effective than any physical measure¬ 
ments designed to determine the degree of dampness. Moreover, by proper 
standardization and calibration among investigators qualitative observations 
can themselves be made objective. 

When the information is collected by means of a census form or question¬ 
naire the questions which are to be asked should be considered at the planning 
stage, since the information obtained will depend on the exact form of these 
questions. Equally the exact form of any observations and. physical measure¬ 
ments which are required should be determined. v si * 

Census forms and questionnaires may be designed either for completion 
by the respondent with little or no assistance from investigators, or for 
completion by the investigator by the aid of questions put to the respondents. 
In questionnaires of the latter type the investigators may be instructed to 
ask questions with a given form of wording, or they may be instructed to elicit 
information which will provide an answer to the questions of the questionnaire 
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by enquiry and discussion without adherence to any exact form of words. 
Both means of eliciting information may be required in the same survey for 
different items of information. 

Census forms and questionnaires designed for completion by the respondent 
may be delivered and returned by post, delivered by post and collected by 
an enumerator or investigator, or vice versa, or delivered and collected by an 
investigator. Use of the post is clearly most economical, and is the method 
generally followed in censuses and surveys of industrial and commercial 
organizations, such as censuses of production. In such cases the use of investi¬ 
gators will not normally have any great advantage over the post, either in ensur- 
ing more complete response or obtaining more accurate information, though 
occasionally in local surveys investigators may be used to explain the purposes 
of the survey and persuade the respondents to co-operate. In population 
censuses, however, investigators are normally used both in order to ensure 
the maximum response, and to give assistance where necessary in filling up 
the forms. Censuses and surveys of small-scale industrial and commercial 
organizations, and of farms, occupy an intermediate position, and the method 
used will depend to a large extent on local circumstances. 

Attention must be paid to the detailed wording of all questions, even if 
these are only intended as guides to the investigator. If the question itself 
creates a wrong impression in the mind of the investigator this will undoubtedly 
lead to errors, even if additional explanatory notes indicate that something 
else is really required. 

Careful thought must also be given to the order of the questions. If questions 
are arranged in an orderly sequence the investigator’s task is much easier, and 
the respondent’s reaction is likely to be more favourable. This applies to all 
forms of questionnaire, but is most important in the verbal questionnaire. 

In many types of survey it is profitable to give the investigator or respondent 
an opportunity of making general remarks on special points. This can be done 
very simply by including a space for observations. Some guidance should 
be given on the type of observations required. Although such observations 
do not easily lend themselves to exact analysis they are frequently of considerable 
value in drawing attention to relevant facts not covered by the questionnaire itself. 

The type of investigator to be employed must also be considered. 
Investigators should have a background knowledge of the subject under 
investigation, particularly in investigations of the research type. In a technical 
investigation into housing conditions, for instance, the investigators should 
have some knowledge of housing construction and of standards normally 
adopted in good practice. This requirement of technical knowledge in the 
investigators limits the scope of unspecialized teams of investigators. Such 
teams are suitable for carrying out ad hoc and routine investigations which 
require only relatively simple questionnaires, but they are no substitute for the 
more specialized teams required for investigations of a research nature involving 
technical questionnaires and observations or physical measurements by the 
investigators themselves. 
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In surveys requiring any high degree of technical knowledge it is usually 
est either to use members of existing organizations, or to appoint a small 
specialized team of technically qualified research investigators. The various 
surveys into the technical and economic aspects of agricultural practice in 
England and Wales, for example, are carried out by the staffs of the National 
Agricultural Advisory Service and the Provincial Advisory Economists. By 
this means teams of investigators are obtained who are technically qualified 
and capable of discussing the problems involved with the farmers • at the 
same time the investigators themselves gain a wider knowledge of the farms 
ot their district which is ol value to them in their other work. 


^ Methods of dealing with non-response 

Unless non-response is confined to a small proportion of the whole sample 
the results cannot claim any general validity. Every effort must therefore 
be made to reduce non-response to negligible proportions. 

Non-response is usually most serious in postal questionnaires. Delays 
m response can also sometimes be very troublesome, particularly when the 
results are required quickly. A rigorous system of dealing with failure to respond 
and delay in response must therefore be instituted at the outset. The first 
step is to send a follow-up letter, but if this does not produce the required 
effect, the possibility of using more intensive methods such as telephone calls 
and personal visits must be considered. These will require a special regional 
organization. r B 

In censuses of industrial and commercial undertakings in which data on 
production, sales, labour force, etc. are required for the purposes of economic 
planning it is usually possible to make the returns compulsory. This is often 
a help m dealing with a small minority of recalcitrant institutions, particularly 
if pressure can be brought to bear in other ways, but it is no substitute for full 
and willing co-operation by the majority. Complete population censuses 
are usually also made compulsory, and there appears to be no logical reason 
why sample censuses of the same type should not also be compulsory. While 
this is little help in dealing with obstinate refusals, since the census authorities 
are not likely to wish to bring the offenders before the courts, it is an indication 
that the government regard the census as of importance, and to this extent 
is likely to act as a persuasive force with the waverers. 

In censuses which are to be repeated at intervals it is particularly important 
to deal vigorously with non-response and delay in response at the outset as 
otherwise they tend to increase progressively. If any large volume of non- 
response persists, or if there is any serious delay in making the returns it is 
an indication that something is wrong with the census, which should either 
be reorganized or abandoned. 

In sociological surveys using the interview method the amount of deliberate 
non-response is usually small. If it is not, the questionnaire and the type of 
investigator used should be reviewed. Revisits by special investigators 
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can be tried, but are not likely to be very effective. In technical surveys of 
agriculture involving interviews with the farmers the amount of deliberate 
non-response is also usually small, unless the amount of information required 
is such that it puts too heavy a burden on the respondents. 

In sociological surveys, however, initial non-response due to failure to 
contact the respondent can be very troublesome. There is no proper way 
of dealing with this except by persistent call-backs. The number of call-backs 
can often be reduced by enquiring of neighbours when the respondent is likely 
to be at home, or where he can be found so that an interview can be arranged. 
Call-backs are also required because the respondent, though willing to give 
the information, is otherwise engaged at the time of the first call. 

The amount of work involved in follow-ups and call-backs can be reduced, 
if this appears desirable, by taking a sub-sample of those not contacted at t e 
first (or subsequent) call, and weighting up the sub-sample m the final results. 
In repeated censuses, however, complete follow-ups are advantageous in 
encouraging better response to later censuses. 


4.8 The frame 

The whole structure of a sampling survey is to a considerable extent 
determined by the frame. The methods of survey which are suitable for a 
given type of material may be radically different in different territories because 
different types of frame have to be used. Consequently, until particulars 
of the nature and accuracy of the available frames have been obtained, no 
detailed planning of the survey can be undertaken. If no frame exists, the 
construction of a frame suitable for the purposes of the survey may well constitute 
a major part of the work of the survey. 

Frames are subject to various types of defect, which may be broadly 

classified as follows. A frame may be . 


(1) Inaccurate. 

(2) Incomplete. 

(3) Subject to duplication. 

(4) Inadequate. 

(5) Out of date. 

A frame may be termed inaccurate if information about the units listed 
in it or defined by it is inaccurate. The term may also be used to cover the 
listing of units which do not in fact exist. Thus a ration-card list in which 
certain women were incorrectly described as married when they were in fact 
single, or in which certain individuals were included who had died, would be 

inaccurate in these respects. . , 

A frame may be said to be incomplete when certain units of the material are 
omitted entirely, and be subject to duplication when certain units of the material 
are included more than once. Thus a ration-card list in which certain individuals 
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were not included, and others were included twice, would be both incomplete 
and subject to duplication. 

A frame may be termed inadequate when it does not cover all the categories 
of the material which it is desired to include in the survey. Thus a ration-card 
list which did not include the temporary residents in a district would be 
inadequate for a survey of the population of that district in which it was 
necessary to include such temporary residents. 

A frame, though accurate, complete, and free from duplication at the time 
it was constructed, may no longer be so at the time it is required for use. Such 
frames may be said to be out of date. Errors of all the first three of the above 
types may be introduced through the frame being out of date. 

These different types of defect have very different consequences in the' 
defects they introduce into the sampling process. Inaccuracy in the frame, 
in so far as it relates to the selected sampling units, will automatically be 
discovered and corrected as the survey progresses, and consequently will not 
invalidate the results. If the information contained in the frame has been used 
as a basis of stratification, etc. or as supplementary information, inaccuracy 
in this information will result in somewhat lower accuracy in the results, but 
the actual accuracy attained will be assessable from the results themselves. 

Incompleteness in the frame will not be discovered in the course of the 
survey itself, and to the extent to which a frame is incomplete the population 
or material will fail to be covered. Incompleteness is likely to be more serious 
than it appears to be at first sight, since it is often confined to units possessing 
some special characteristics, which may in consequence be seriously under¬ 
represented in the sample. Duplication has a similar effect, since the dupli¬ 
cated units will have a double chance of being included in the sample. There 
is the difference, however, that incompleteness cannot be determined or set 
right by an examination of the frame itself, whereas duplication may under 
certain circumstances be detected and corrected by such examination, though 
this will almost always be a tedious operation. If the sampling fraction is 
large and the degree of duplication is also large, the duplication may come 
to light m the course of the survey. Thus, with 5 per cent, duplication and a 
sampling fraction of 1 in 10, two out of every 210 units in the sample will on 
the average constitute a duplicate pair. With a sampling fraction of 1 in 100, 
however, only two out of every 2100 units in the sample will constitute a 
duplicate pair. 

A frame which is inaccurate for certain purposes may be incomplete for 
ot ers. Thus a ration-cai d list in which some of the single women were 
described as married would be complete, though inaccurate, if used as a frame 
for a survey of all women, but would be incomplete if used as a frame for the 
survey of single women only. Such incompleteness could be remedied by 
taking a sample covering all women, and rejecting those members of the sample 
who were found on investigation to be married. 

Inadequacy of the frame will usually be known before the survey is under- 
ta en from the specification of the frame itself. Inadequacy can in general 
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only be dealt with by the construction of a subsidiary frame for the omitted 
categories. 

In actual practice, frames are likely to suffer to a greater or less extent 
from all of the above defects. It is therefore essential at the outset of the survey 
to carry out a careful investigation of any frame it is proposed to use, since 
many defects are not at all apparent until a detailed investigation has been 
made. Such an investigation will naturally commence with a study of the 
administrative machinery by which the frame has been constructed and by 
which it is kept up-to-date, but may also have to include a certain amount 
of field work. 

4.9 Frames suitable for censuses and surveys of human populations 

Human populations have a tendency to aggregate in towns and villages, 
often with very high local densities, which makes any form of area sampling 
based on maps and plans subject to high variability, unless a very elaborate 
sampling procedure is adopted. This is most serious if the total numbers 
are not known, and require to be estimated from the sample, but even the 
proportions of the population falling in different categories will be subject 
to substantial errors, since different classes of the population tend to be concen¬ 
trated in different areas. 

Three very different types of survey of human populations may be 
distinguished. These are: 

(1) Surveys of the census type, requiring the collection of relatively 
simple facts, but covering the whole population, and capable of giving 
separate results for small administrative areas. 

(2) Surveys covering the whole population of a country, and capable of 
giving reasonably accurate estimates for the whole population, and 
possibly for certain broad subdivisions, but not for small administrative 
areas. Such surveys often involve the collection of more detailed 
and elaborate information than do those of type (1). 

(3) Local surveys covering a particular town or rural area, or a few 
contrasted towns or rural areas, in which no attempt is made to obtain 
a sample which is fully representative of the country as a whole. Such 
surveys almost always involve the collection of detailed information 
by field investigators. They are usually investigations of a research 
nature, and may be precursors of simplified surveys on the same 
problems covering the whole country. 

Surveys of the first type present relatively simple sampling problems, 
and relatively complicated administrative problems. The sampling, since 
it has to cover small subdivisions of the population, must generally be single- 
stage, usually with stratification and a uniform sampling fraction. Surveys 
of the third type are also relatively simple ; since only limited areas have 
to be covered, a one- or two-stage sampling process usually suffices. 
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applicant for a new ration book should finf^^tlorr M locaTcharYges of address. 

This, however, did not ensure imme ‘ a e ^8. f the remova l necessitated change 

were often only revered a, .he toe 

defects. Consequently neithei of these 8 single food-office district, 

the sampling of small admimstratie of populat i on were frequent 

particularly during the war w the other hand> they were and are 

^ a sampie census of the 

Wh theTo“ffice card index was used - £ ft"" 

Census of the United Kingdom. & providing, for married women, 

Commission on Population, with e o j j birth f a q children, 

information on age, age at “ 8e > ^ £? never previously been 

and husband’s occupation, m orm married women (including 

collected in full. A sample of 1 m 10 « f a ^he m ^ card> and 

those widowed or d.vorced w^ MenM^ev^^ g ^ ^ ^ the 
recording the name and address women selecte d by this process 

C: “uesto". Zfto questionnaire •• unowned.” Quesuonnarre, 
: despatched by post, and collected by ““of the old tod-office 

Since ffiere is the tod office issuing 

card on removal—this is eltectea y wlt h remova ] s . 

the new counterfoil.-specnd steps,had » be «k» » ( ^ ^ daM when 

This was effected by tong a aero date at a P somewhat longer 

the sample was taken. Theinterval be receive d a. the 

than the time taken (or notification of chango“ changes of addres8 

old office. Thus virtually all duplicate cards P Registrations effected after 

prior to the aero date would,have been removd. R g llltio „ bearing 

the aero dam were excluded from the sample, and^ ^ mw 

"addtos 0 Lffig toTd” “-be seen tha, by ffiis procedure all individuals 
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°n,y and .1* only inlands 
reason delayed thX ""Sfji'' ‘Z T" ' h “ e Wh ° h>d for ““ 

addreas had not yet been received by th/'.Jd'effi!” "S"” °! cta «? “f 

all duplication except in the rare l c . c f* This procedure avoided 
offices, while obviating he nee T f eXCeSS1Ve dela y in notification between 
..-da,; non-duplicatetf index * * a " emP ' *° » My up- 

4.11 Frames from complete population censuses 

sssr s s.’s? 

■ Hr ~r ir - 

in H 

s?a^ffia;. h r»S ?S= 

5 rl=“=£- 

individuals “ d ». - d no, 4 

A complete census will provide a verv suitahle fram^ r • t 

=S--i"^t^SFSS^ 

co»s,i,“ed S 7e lr^1aZ P “ ng ' ” ,h eX “ ptlon ' ha ' * he fal P h> “ 
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Five different types of form were used, with the marked lines distributed in 
the manner shown in Table 4.11. 

Table 4.11-Sampling line numbers in the 1940 U.S. Population Census, 

AND THEIR PROPORTIONS 


Style 

Proportion I 


Line 

numbers 


V 

16 

14 

29 

55 

68 

W 

1 

1 

5 

41 

, 75 

X 

1 

2 

6 

42 

77 

Y 

1 

3 

39 

44 

79 

Z 

1 

4 

40 

46 

80 


The investigators were instructed to enter the names of each family in a 
defined order, Ll to complete all lines of the form before commencmg^new 
form. Actually these instructions were not always adhered t , 2 p 

of the last lines (Nos. 40 and 80) being found to be blank. If Ae f 

over the earlier lines, which are not marked on the W-Z forms, but not as far 
as the lines marked on the V form, this will lead to a slight deficiency m t 
proportion in the sample of entries in line 1 and the other lines marked on 
the W-Z forms. This disturbance, however, is only very small, but any tendency 
of the investigators to alter the order in which the names were entered so as 
to secure a suitable person for supplementary questioning could easily g 

rke to more serious biases. . , t r 

The danger of this type of bias is always present in this method of sampling, 
and can only be overcome by the most rigorous training of observers, and 
the imposition of rules which determine uniquely the order in which names 
are entered on the list. 


4.12 Frames from lists of households or dwellings 

Lists of households or dwellings are frequently available fromjich 
sources as rating offices, electoral registers, etc. Frames based on such lists 
are in many ways preferable to frames based on lists of individuals. As already 
mentioneTin most surveys in which the information is co lected by persona 
visit it is advantageous, and often essential, to collect information from a 
members of a household, in other words to use households as sampling units. 

Frames consisting of lists of dwellings also have a much greater degree 
of permanence, being unaffected by movements of the population. Such 
frames, if complete at the time of their construction, will only become 
to the extent that there is new building, or changes m the use of existi g 
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buildings. New building is necessarily a slow process, and the listing of new 
buildings usually presents no serious difficulties. 8 

Lists of households can generally be utilized to give a frame of dwellings 
/ . mg 3S * he sampling umts the dwellings occupied by the households at 
he time the frame was constructed. Certain special precautions are required 
to ensure the inclusion of dwellings which were unoccupied at the time the 
hst was prepared. In a town in which the dwellings am arranged inTreets 

ffiffir !lt Whl WL the L 1St i! S a s ° arranged b y streets this presents no particular 
y. hat has been called the half-open interval can be used The 

procedure is as follows. When drawing the sample, the dwelling unit appearing 

“ , h f IlSt .*? , the selected unit is recorded, and the field investigator if 

Ind if^n t l ^ th T 18 any ° ther Unit ° n the ground bet ween these two Units 

let L l at Umt m the Sam P le ' Thus field investigator might 

ceive the instruction to survey No. 9 in a certain street, with No. 13as the next 

recorded unit (odds only). If on visiting No. 9 he finds that No. 9A and N^ U 
also exist these are also surveyed. The even numbers between 9 and iTare 
not included, smce the instruction “odds only” indicates that they lie onthe 
opposite side of the street. This procedure is clearly only possible if the list 

th^grmmd ' Iftiie^ ^ C ° rreSpa ? ds t0 some geographical pattern on 
the ground If the hst is not so arranged, incompleteness of the frame cannot 

be corrected by the use of the half-open interval or analogous procure In 

rearrangement ofthf 7 inhaVe t0 be ame " ded b y oth er means, and complete 
angement of the hst m some geographical order may be necessary. 

eXam P e . of tbe use of this type of frame is provided by some surveys 

Mini fy"" 8 o' 6 War m Certain . towns in the United Kingdom by the 
y o Home Security, to investigate disturbances to the population on 
ccount of air raids. In the English towns electoral registers were used 
s frames, and in the Scottish towns rating lists were so used. The electoral 
egisters consist of printed lists of voters arranged by streets in order of 
welbngs, all voters in one dwelling appearing together. Each dwelling therefore 
has as many entries as there are voters. Consequently, selection o/entries in 

lithe H W ff h P n° bablllt 7u Wi11 n0t g ' Ve an ec l ual Probability of selection 

JL hi f C T hi® lmg a u ThlS C ° uld have been over come by subdividing 
the list into dwellings, and basing the sampling on these dwellings 8 

As the surveys had to be conducted at considerable speed, deify in selection 

of the sample was avoided by the device of examining every xth entry and in 

'’ 8 ™, the T7 U if “ ,ry " ferr ' d “ ,he n»«mb„ 

rh P h dwellln g- This introduces a certain additional discrepancy between 

L Ti g ^ VX ’ and Ae 3CtUal fraction of dwellings incffided 

“ P ’ 7 introduces errors that are appreciable relative to t£ 

nhi 1 "! k rr ° rS /° r i estlmates of such quantities as numbers in the population 
nrimi by mult,pI / m g the sample total by *. Such estimates were not the 

reqffired "if n^esf ‘ ^ SUrVCyS ’ ^ conse fi uentI y no adjustments were 
i . T’ errorS ansmg from this cause could have been eliminated 
subsequently by ascertaining the ratio of the number of dwellings incIudS 
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in the sample to the number in the whole register, and treating this ratio as 

the in U thi S sTu P rvey theTethod of the half-open interval was used to deal with 
dwellings not included in the list, and was found to be quite satisfactory. 
Had there been new housing estates not covered by the register t ese wo 

have had to be dealt with separately. , * 

In certain towns the separate flats of blocks of flats were not listed, and 
therefore presented a difficulty, since the blocks constituted very large 
units whose chance inclusion or exclusion would have materially increased 
the sampling error. The existence of blocks of flats was, however, always 
apparent from the large number of voters appearing under the same addres . 
S bfocks were therefore listed, together with other large institutions, by 
preliminary inspection of the register, and every *h flat was selected by visit 
to all the blocks in turn. 


4.13 Frames provided by town plans 


LU * * --£- ~ 

Town maps and plans provide a useful frame for the sampling of dwellings 
in buTup areas. In some cases there may be detailed maps showing the 
location of all dwellings, but in many cases only street plans, not showi g 

any great amount of detail, will be available. w ;n 

Any town plan which gives an accurate representation of *e streets will 
t j£ rheTwn to be divided up into “ block,," i.t. area, bounded by stress 
Such a plan will, therefore, provide a frame for area sampling in which t 
unks are bfocls. A sample of dwellings can then be obtained by including 
all the dwellings in the selected blocks. In general however the variability 
■ between blocked block is likely to be large even after careful stratificat 
since there is often considerable local segregation of different classes of the 
population. Consequently, two-stage sampling is in general a ^geov i 
blocks being taken as the first-stage units and dwellings as the second g 

UIli To obtain a two-stage sample in cases in which the map does not show the 
location of dwellings, it will be necessary to construct the second-stage frame 
for the blocks selected at the first stage by ground survey. This however, 
is a much lighter task than the construction by ground survey of a frame fo 
all dwellings^ the city, and can frequently be done in the course of the survey 

ltSC In towns in which the natural blocks are of very unequal area, groupings 
of the smaller blocks or subdivision of the larger blocks should be performed, 
£ £ rStae“#!•«, inequalities in area. If little i, known about 
die town it may be advantageous to make a ground survey in order to demarcat 
and stratify the^lock units. It may even be advantageous to carry out a rough 
preliminary count, or make some other rough estimate of the number of dwelling 
Si each block, as this will enable the subsequent selection of the first stage 
units to be made from within strata with probabilities proportional to the 
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estimated numbers of dwellings. This was done in parts of the Greek population 
census described in Section 4.16. If such preliminary estimates are not 
vailable the best that can be done is to make a selection with probabilities 

£rr al l u arC f but )) bIock areas are unlikely to be very closely correlated 
with the number of dwellings, even within strata. In either case the second- 
stage sampling fractions may be taken inversely proportional to the first-stage 
tractions, so as to give a constant overall sampling fraction. 

Whether an elaborate procedure of this kind is needed depends not only 
on the accuracy required but also on whether estimates of total numbers are 
required from the survey. Since total numbers will be highly correlated with 
numbers of dwellings, prior supplementary information on these numbers 
tor the different blocks, even if only rough, will be particularly effective in 
reducing the sampling variability of estimates of total numbers. They are 
not likely to have such large effects on estimates of the proportions of the 
population falling in various categories. 

Sampling by streets is sometimes used instead of sampling by blocks. 
I his is usually not so satisfactory as sampling by blocks, since each block repre¬ 
sents a clearly defined area, whereas if a street is taken, there is often doubt 
as to exactly what is to be included and what excluded: alleyways and court¬ 
yards having entrances from more than one street, and not shown on the street 
map, for example present considerable difficulty if the sampling is by streets. 

Sampling by blocks is particularly valuable for surveys of towns in which 
a , 2 Pe j-^ f bul dlng have t0 b . e covered - Second-stage sampling of any or all 
® u 1 eren * types of building can be adopted if required, by enumerating 
the different types for the selected blocks after the first-stage sample has 
been drawn. r 


4.14 Frames provided by maps of rural areas 

, Tb f.^ Se ° f ma P® as a frame for tJle sampling of rural areas presents some¬ 
what different problems from those encountered in the sampling of towns 
by the aid of town plans. 

If accurate and detailed maps showing all or virtually all buildings are 
available, rectangular areas may be used as sampling units, the buildings falling 
in the selected areas being examined on the ground to see whether they are 
dwellings, with a further examination for unmapped dwellings. 

Sampling with probability proportional to the apparent number of dwellings 
indicated by the map is possible, but would involve counting the dwellings 
in all the rectangular areas. Consequently it is better, if preliminary Work 
on the maps of this magnitude appears to be worth while, to divide the map 
into areas containing approximately equal numbers of dwellings, using natural 
oundanes as far as possible and paying particular attention to stratification. 

t may be noted here that the selection of a point at random on the map 
and selection of the dwelling unit nearest to this point for inclusion in the 
sample—a method which is sometimes used—is* inadmissible, since a unit 
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which is widely separated from other units will have a much greater chance 
of selection than one which is close to other units. , 

With less detailed maps rectangular areas marked on the map will not be 
capable of being demarcated exactly on the ground. Natural features occurring 
on the maps must therefore be used as boundaries of the sampling units. 1 his 
will necessarily give units of differing size. In particular, occasions wi arise 
when it is impossible to subdivide a somewhat large area. In such cases the 
area in question may be taken to represent two or more units. If any of these 
units are selected, a subdivision into two or more parts as alike as possible 
is made on the ground at the time of the survey, and the requisite number 
of parts selected by random choice. 

. In most countries, even in rural areas, there will be a number of villages 
of varying sizes, which are best dealt with separately by some form of 
stratification and two-stage sampling, since if these are included in the area 
sample a high degree of variability will be introduced. The use of a variable 
sampling fraction at the first stage, a larger proportion of the larger villages 
being selected, will be advantageous. A compensating reduction in the second- 
stage sampling fraction can be made if desired. The boundaries of all villages 
will require careful demarcation' as otherwise there will be ambiguity as to 
what should be included in the area sample. 


4.15 Frames from lists of villages 

In undeveloped areas the available maps are not likely to be of sufficient 
acfcuracy for area sampling. Where the population is concentrated in villages 
these usually form the best first-stage sampling units. A list of villages will 
then serve as a suitable frame. 

Even if the majority of the population is concentrated in villages there may 
be a residue located in the intervening countryside. If this residue owes 
allegiance to definite villages, the problem is relatively simple, since all that 
is . required is the identification of the individuals belonging to the selected 
villages. This can normally be done by the head-men of these villages. 

If no such association exists, some form of area sample of the intervening 
areas may be necessary. If rough maps are available, suitable areas may be 
demarcated by tracks, rivers, etc. If no maps are available, some form of line 

sampling may be possible in open areas. 

If the country is not sufficiently open to be easily traversed, the construction 
of a rough map of the tracks may be necessary before any adequate sampling 
of the intervening areas can be carried out. It may be possible, however, to 
Use these tracks without full mapping. Thus all tracks leading from villages 
selected for the sample may be traversed, and dwellings to which they give 
access included up to half-way to the next listed, but not necessarily selected, 
village. Such a method will only be effective with a relatively simple track 
system such as is met with in forest areas: intermediate junction points, for 
example, present special problems. 
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4.16 The 1946 population sample for Greece 

The 1946 population sample for Greece provides a good example of the 
way in which sampling methods of the types discussed in the previous sections 
can be used to obtain speedy census data from a sample covering the whole 
population of a country. (Jessen et al, 1947, C). 

The sam P^ e was taken by the Allied Mission for Observing the Greek 
Election, as part of the investigation of the accuracy of the electoral lists A 
population sample was required in order to test for omissions from the electoral 
ists, and the opportunity was therefore taken of securing more general census 
data. The corresponding test for duplications and redundancies in the lists 
was made by examining the lists themselves and investigating a sample of 
names drawn from them. 

The frame for the first stage of the population sample was that given by 
the 1940 Population Census. This census gave returns for koinotetes , which 
are small communities or groups of villages, and demoi, which are towns and 
cities, usually with more than 10,000 population. Maps were available which 
showed the areas included in these koinetetes and demoi, and the names and 
location of all the populated centres. The koinotetes and demoi were used as 
sampling units at the first stage of the sampling. The units were stratified 
according to their population in 1940, and a variable sampling fraction was 
used. The actual scheme is shown in Table 4.16. Selection from within 
strata was systematic. 

The samphng of the selected first-stage units was based on lists of house¬ 
holds within the area. These lists were either based on existing lists checked 

and brought up to date, or were specially prepared to show the location of the 
households on a map. 

F°r the sampling of towns an Additional stage was used, a sample of 
blocks demarcated on an existing or a constructed street plan being first taken, 
with a further sample of houses from within the selected blocks. Sampling 
was sometimes with probability proportional to estimated numbers of house¬ 
holds, these estimates being obtained by a rapid cruise of the whole area and 
sometimes with equal probability. 

The sampling fraction at the final stage was in all cases adjusted so as to 
give a constant overall sampling fraction. When blocks were sampled with 
probability proportional to estimated numbers of households, this required 
that the samphng fraction within the selected blocks should be taken as inversely 
proportional to the estimated number of households in the block. Thus the 
parish of Agios Panteleemous, which is given as an example, was initially sub¬ 
divided into 98 blocks. Before sampling, some of the smaller blocks were 
combined so as to give 65 combined blocks* The total of the estimated 
number of households was 966. It was decided to sample three blocks, which 
were selected systematically by taking a sampling interval of 322 (= 966/3) 
with a random starting point of 288, using sub-totals in the manner of 

by p^hes^ectionln 6 ^ “ th * sam P lin « of Hertfordshire farms 
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Example 3.2.C. This gave combined blocks containing 23, 18, and 13 
estimated households respectively. Since a sampling fraction of 1/100 within 
the parish was required, the estimated number of households in the sample 
was 966/100 = 10. This number was divided approximately equally between 
the three selected blocks (4, 3, 3), and the sampling intervals were calculated 


Table 4.16— Greek Population Census : Summary of the Sample Design 


Size- 

class 

code 

Population 

in.1940 

Assumed 
average 
population 
in 1940 

Sampling ratios 

Number of places 

For 

selection 
of sample 
places 

For 

selection 
of names 
and 

households 
within a 
sample 
place 

In 

size- 

class 

In 

sample 

For koinotetes 

1 

0- 499 

350 

1/100 

1/5 

2,147 

20 

2 

500- 999 

750 

1/50 

1/10 

2,049 

40 

3 

1,000-4,999 

2,500 

1/20 

1/25 

1,366 

70 

4 

5,000 and over 

7,000 

1/5 

■ 1/100 

54 

10 





Totals 

5,616 

140 

For demoi 

5 

Under 25,000 

17,000 

1/2 

1/250 

52 

26 

6 

25,000 and over 

— 

1/1 

1/500 

22 

22 





Totals 

74 

,48 


by dividing the estimated numbers in the blocks by these numbers, i.e. the 
intervals were taken as 23/4 = 6, etc. This procedure gives the required 
constancy in the overall probability of selection. The actual number of houses 
in the sample will of course differ from 10 if the estimated numbers are in 
error: it is the overall probabilities of selection, not the numbers of houses, 
that must be fixed. 
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The ratio method of estimation was adopted, using the 1940 population 

unirSecZlfel'IT fe " i0n prob,M,y " £* 

umt (bection 6.16), as this was considered to be the most accurate There 
SJi 0W T’. l aPP ' a l “ 5' S ° me dan * er ° f “» * ntr °duction of bL by tS 
vambl, Mmptag facta. (Section 6.11) Sigh, have been p«f.3 

cT °4'e fieZ“? ° f r‘ POP “ Iati “ b »"S ecimated .o y 7f± M Z 

an interorefer Zd ? CUp,cd 6 ? observer ,ea ™. “ch consisting of an observer. 

anH tk P d dnVel ’’ Wlth 3 jee P> for three weeks. The entire sample 

and the computations were completed in 7 weeks. P 

4.17 Master samples 

Of r^pATlTely SSS 

“Z^g smal,er s,mpte d ' a ™ - yss b “ 

The use of a master sample has a number of advantages. It enables a more 

?ftk!T’ COmplete al J d adequate frame to be constructed^!! could be justified 
if the frame were only required for a single survey. It simplSX 

stmn? P i eS ’ Tk 6 ln ? e Sub " sam P lin g «»ly the material oontSnad 

^ t0 k-^ SdeCti0n pf0CeSS - * enables 4pTem™ 

taSrs Stevs AndT t “ ° f " due “ im P rovin g the acc ^acy of the 
that the c, y ’ d 1 nables surveys on the same material to be so planned 

surveys—a^atter oTso 1101 “ eXC , essive number of times for different 

response ^ " ° btainRd by 

^ The most extensive and elaborate master sample so far constructed is the 
master sample for agriculture of the United States of America The construe 

Coffevl HlS SamP C W3S Undertaken b y the Statistical Laboratory of Iowa State 

Bu^ tl ’of^^^^ Peratl0n IIV Wi,h ^ ° f A ^ural Economics and tt 

bureau of the Census (King and Jessen, 1945, G). 

the whole? thf Unhe?mL ma ?e Sam ? le h COnSiSt ° f ^ areas C0Verin g 
sauare mil*. hi 1 ’ The units have a mean area of about 2-5 

m?I 1 ’ b Vary accordln g t0 location and other circumstances the 

weTe for 6 " T *** fr0m °' 71 S( i Uare miles *> ld 8 square mlesf hey 

P “: d ££■ “ot m 4b 

master sample. eighteenth of all the areas were selected for the 

categories^calle'cf fiTti? ? ^ ? U ” lted States was divided into three 

strata are (1) the incorporated s^^^Vthe?^ TbeSe P rim ary 

open-countrv stratum tk ’ ( 2 ) the unincorporated stratum, (3) the 

p country stratum. The incorporated stratum Consists of incorporated 
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cities and towns and unincorporated places regarded as “ urban ” by the 
Bureau of the Census. The unincorporated stratum consists of all named 
places outside the incorporated areas which have an estimated populat 
of 100 or more, and all other areas which appear on the map and have 
population density of 100 or more persons per square mile. 

The incorporated areas were defined by the corporate boun aries, 
the location could be obtained. The unincorporated ar^s were demarcated 
on the maps so as to give areas as compact as possible, while including every 
thing that ^hd not appear to be open country. Subject to this, the boundaries 
werf chosen so as to be easily identified on the ground. Aerial photographs 

transportation maps showed with varying degrees 
of JccuracP the locaJn of farms and other dwellings m *e 
areas and to some extent in the smaller unincorporated places and these were 
therefore used to demarcate the actual sampling units of the open-count y 
steatti The procedure was as follows. The numbers of farms a^d non¬ 
farm units were first counted in what are termed count units omthe 

map. A count unit consists of a unit defined by minor civil bo ™ c ^ ries ° r 
natural boundaries, and in general included from 6 to 30 farms. These count 
units were numbered, and the number of farms and the total number of dwe mg 
including farms were marked on the map. The number of sampling umts ‘nto 
which each count unit was to be subdivided was also P 4 

the map • in making this decision, consideration was given to the prevalenc 
of nalurai boundaries, etc.. The data for each count unit were then recorded 
on punched cards, and cumulative totals of the farm count, the dwelling count 
and'the number of sampling units, were tabulated. These cumulative tota 
were used to determine the count units which contained selected units, a random 
number between 1 and 18 being chosen as a starting-point, the ^ nt U 
containing every 18th sampling unit being selected thereafter. The count 
units containing selected sampling units were nex subdivided on the p 
into the specified number of sampling units, the subdivisions being so chosen 
S ty col be located on the ^ouud. The units so demarcated were 
then numbered or counted systematically and the appropriate sampling unite 
selected. Existing aerial photographs were used extensively for the demarcat on 
of boundaries. In cases in which there were no suitab e natural boundaries 
on the maps or photographs two or more units were unit 

and random selection being subsequently made on the ground if either un 

WaS So e me C what different procedures, which need not be detailed here, were 
followed for the unincorporated and incorporated strata. For the ^ 0 ^ ated 
stratum information was obtained from the Bureau of the Census on numbers 

° f iHte final form the master sample will provide an adequate master sa " 1 ^ 
, of both farms and population, and also of the land area 0 {^h 0 k °f the 
United States. Because the sampling units consist of areas, the frame w 
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remain complete and adequate whatever changes occur in the course of time 

of dwT leme T y I ”[ 0rmatlon Prided by the number of farms and number 
mg units will naturally become progressively more inaccurate but 

2 y “I*' Pk “ °" 1? andrSas^ 

There w h h f “T* ° { J£ T reVeal the extent of these inaccuracies, 
nece s a 7;L*rof re ’ ' f n ° the sam P le when it appears 
TS ilT/ f m ^ 638 ° f thg C ° Untry Where extens ive changes have occurred 

country. S “ °° ^ the existin S sam P^ *« the rest of the 

imdS-Ti? bC See > 1 tha ! tht , construction of a master sample of this type is a major 
undertalung, and it should not be assumed that a master sample of the same 

dIffemnt neC Thus • T ^ countri “ in "Inch the conditions are 

orovide .n Th V n Um * ed Kln S dom the 6-inch Ordnance Survey maps 
provide an excellent frame for land area surveys, and the register of farms 

° , f maintained for .the collection of agricultural statistics provides a very 

necessarv t T * ™ ersam P le agriculture is ever considered 

returns of ft, tru ^° n C0lald be based °n this register and on the associated 

sfmnT 1 farme ?; L The task of construction would therefore be very much 
simpler than would be the case if no such register existed. On the other hand 
there is a need in the United Kingdom for an adequate master sample for 
localized population surveys. This problem is discussed in the nextTectiom 

4.18 Localized population surveys 

a j read y been indicated in Section 4.9, surveys are ofteq required 

n f 1C f glVe reasonabl y acc urate estimates for the country as a whole but 

concentmedlnT? SI ? a11 . administrative ^cts. Such surveys have to be 
concentrated m a few localities, particularly if they are to be carried out by 

Id investigators, since the amount of travelling would otherwise be excessive 
and superv.sion difficult. They may therefore be termed 
mult -smge process must be used, the units at the first stage being administrative 

covered Z Z ri ? * UCh ^ ** each Selected unit is enable of bring 
covered by a single investigator or a small team of investigators. 8 

stave of theT,mIr he Pt ' 0bIem ; th f efore > consists of so planning the primary 
stage of the sampling process that the sampling error at this stage is not excessive 

A sec° ndar y consideration, which must not be ignored, is that the ^thim 

cstimateTfffi?^ 8 f be ^ciently numerous to furnish a reasonable 
estimate of the sampling error at the first stage. 

in ° f f Stratification is obviously indicated.' This stratification must 

n the first instance serve to differentiate between urban and rural areas 
Consequently the country should be divided into large cities, into smaller 
in th* areaS . and mt< ? rU r d ateaS ’ in a manner somewhat: similar to that employed 
will den m T ° f the United States - The number of d **es required 

^ ° £the t0WnS ^ areaS ‘ A variab le sampling 
fraction will be required m association with.this,stratification ; for most surveys 
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it will probably be necessary to take all of the very large towns, but a proportion 
only of the intermediate towns, a smaller proportion of the smaller towns, 
and a still smaller proportion of the rural areas. Regional stratification of th 
smaller towns and rural areas may also be adopted as far as possible in p 
with this stratification. 

These two types of stratification by themselves, however, are not likely 
to be entirely adequate for the urban areas, and some further form o stra 1 ca 
tion may be sought which will ensure (a) reasonably correct proportions of 
areas of different industrial types, and ( b ) reasonably correct proportions of the 
different social classes. 

The methods by which it may be possible to ensure this will vary greatly 
according to the nature of the country, the type of primary unit that is adopted, 
and the amount of information that is available on these primary units, 
ministrative areas are usually most suitable from the point of view of the amount 
of readily available information, but they are not always ideal from the sampling 
point of view. As far as the United Kingdom is concerned administrative 
areas appear to be the only possible type of area which can be used without 
a great deal of preliminary work. They will probably prove reasonably satis¬ 
factory if the boroughs and urban districts associated with the large towns 
are treated as parts of these towns, and sampled fairly intensively, lhus, 
for example, the sampling of the various parts of London and of its satel 1 e 
suburban towns should be considered as a special problem separate rom 
of the sampling of the smaller towns in other parts of the country. 

The second-stage sampling of the selected first-stage units is not likely 
to present any very serious problems. In the very large towns such as London 
and in dispersed rural areas, two or more stages are likely to be required to 
avoid excessive travelling. Adjustment of the sampling fraction at the final 
stage to give equal overall sampling fractions is often advisable since estimates 
can then be rapidly and simply obtained. Provision at the final stage for a 
proper rota of households to be included in the different samples, so as to avoid 
using the same household too frequently, is also of importance. 

Much further research work remains to be done before: it can be said wit 
certainty whether a sample of this nature covering t e mte m ^-ir^L nt 
likely to be satisfactory for all purposes, or whether samples having a different 
structure will be required for different purposes The unporance * 
investigating the possibility of obtaining such a sample is clear. 1 ► 

localized sociological and economic surveys of the general population can 
be carried out with any high, and at the same time ascertainable, degree ot 

accuracy. 


4.19 The U.S. series of employment estimates 

■ An early example of a localized sample is that set up in the United States 
in 1939 to provide regular and speedy statistics on unemployment, employment 
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and the labour force (Frankel and Stock, 1942, F). The sample was modified 
and improved in 1943 (Eckler, 1945a, F). 

In the original sample, counties were used as the first-stage sampling units. 
All the 3097 counties of the United States were classified and sampled as 
follows:— 


Cities 

Total No. 
of counties 

9 

Percentage of 
population 

14 

No. of counties 
in sample 

9 

Urban 

447 

50 

28 

Rural 

2641 

36 

27 


3097 

100 

64 


The 9 city counties relate to the 5 largest cities ; all these 9 counties were 
included in the sample. The urban counties are those with 1930 populations 
of 45,000 and over. In the urban and rural classes a triple stratification, each 
of three strata, was adopted, the bases of the three stratifications being 
population, administrative areas, and percentage unemployed. Divisions 
between each of the main strata were so chosen that approximately equal 
numbers of counties fell in each main stratum. There were thus 27 sub-strata 
for both the urban and rural classes.* One county was selected from each 
of these sub-strata at random, with one exception where two counties were 
selected. 

A further two-stage process was used to sample the urban and rural areas 
within the selected counties. The numbers of households to be selected from 
the various urban and rural areas were allocated on the basis of the census 
population figures for these areas. This led to the gradual development of a 
differential bias between urban and rural areas, owing to a drift of population 
away from the rural areas. 

The results from within a single county were aggregated without any 
weighting. The aggregates were then weighted according to the population 
of the stratum from which they were obtained. 

The within-county sample was, changed every 4-6 months. This was a 
Compromise between having a constant group of households, which would 
give most accurate estimates of monthly changes, and having new households 
on each occasion so as to avoid repeated visits to the same household. It 
introduced a certain discontinuity into the results, which has been avoided 
in the modified sample by using a proper system of partial replacement of the 
type described in Section 3.17. 

In the modified sample, which included 68 first-stage units, allocation of 
households on the basis of population figures was abandoned. Instead, small 
areas were used as units at the second stage. This eliminates bias resulting 
from population drift. 

* It will be noted that the numbers of the counties in the different sub-strata 
were not by any means equal. 
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Several other features were also introduced in order to improve the accuracy 
of the results. The stratification was more detailed, selection of the primary 
units was with probability proportional to their populations, and the ratios- 
between the numbers of households having certain contrasting characteristics, 
e.g. farm and non-farm, were adjusted in each selected first-stage unit to agree 
with the corresponding ratios in the stratum to which the unit belonged. This 
last procedure is not entirely free from danger of bias. In a unit with a relatively 
small proportion of farm households, for example, those that do occur may 
be expected to be somewhat abnormal owing to proximity to non-farm areas, 
and such abnormal households will consequently receive excessive weight 
in the final results. 

In both these samples only a single unit was selected from each of the 
first-stage strata. Although this unquestionably increases accuracy by permitting 
the use of smaller strata, it has the consequence that no fully valid estimate 
of error is available. The best estimate is that obtained by combining the 
strata in pairs, and this is likely to be somewhat of an overestimate. 


4.20 Frames suitable for special classes of a human population 

Surveys of special classes taken from the whole of a human population 
are often required. If a general frame covering the whole population is 
available, it can be used for a survey of a special class by selecting a sample 
from the whole population, and rejecting those members which do not fall 
in the required class. If the frame itself does not contain the necessary 
information, this will necessitate surveying all units of the sample in order 
to find out which individuals are to be retained and which rejected. If the 
required class is only a small fraction of the whole population there will be 
a large proportion of rejects, and a disproportionate amount of work is there- 
fore required in such cases. 

Consequently, if a frame covering only the required class or classes is avail¬ 
able, this should be used in preference to a general frame. In surveys of the 
labour force, for example, it is often possible to utilize unemployment 
insurance registers and similar records. Such frames are often to a certain 
extent inadequate—all types of labour may not be included in an unemploy¬ 
ment insurance scheme, for example—but their greater convenience frequently 
outweighs their defects. Occasionally it may be considered advisable to cover 
the excluded classes with lower accuracy by means of a general frame. 

When no partial frame exists a survey undertaken for another purpose 
can sometimes be used to provide one. Thus in a recent survey of the aged 
carried out by the Nuffield Trust in certain towns of the United Kingdom* 
the records of an earlier survey by the Social Survey of the Central Office 
of Information covering all households were used to locate those households 
which contained aged people. 
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is no easy way of measuring. Much of the information provided by public 
opinion polls is therefore of doubtful significance. 

On the other hand, if used with skill the quota method may give sufficiently 
Accurate results in simple enquiries where only general indications of the 
opinions held are required. If the samples are taken in the same manner on 
different occasions, and circumstances remain broadly the same, it may also 
provide a not-too-inaccurate measure of changes of opinion. 

4.23 Frames for agricultural censuses and surveys 

Agricultural censuses and surveys can be carried out in collaboration with 
the farmers, or in certain circumstances by direct observation without contacting 
the farmers. The latter method is in general only applicable to surveys of 
agricultural crops, and then only if all particulars required are ascertainable 
by inspection. For censuses and surveys of livestock the collaboration of the 
farmer is usually necessary, the essential difference being that livestock is 
mobile whereas crops are immobile. Collaboration is also obviously required 
if information relating to the farrq as a whole is needed. In many countries 
contact with the farmer is advisable even for crop surveys, because exception 
may well be taken to the examination of a crop without the farmer’s permission. 

If a census or survey is to be conducted by contacting the farmer, the farm 
will usually form the sampling unit at some stage of the sampling process. A 
frame covering farms will therefore be required. Such frames are provided 
either by lists of farms, or by some form of area sampling which serves to locate 
the farmhouses. Frames based on maps, etc., which are suitable for the sampling 
of human populations in rural areas (Section 4.14) are equally suitable for 
the sampling of farms. 

If contact with the farmer is not necessary maps can be used directly as a 
frame for crop surveys. Their use for this purpose is discussed in the next 
section. Even in this case, however, farms may well provide the best available 
frame. 

In crop surveys the natural unit for many purposes is the field and not 
the farm. In cases where it appears advisable to obtain information for some 
only of the fields of a farm under a given crop, a further stage will have to be 
introduced into the sampling process. This inevitably results in a somewhat 
complicated sampling structure with different sampling fractions for the different 
parts of the sample, which in turn introduces complications into the analysis 
of the results, at least if unbiased estimates are required. 

An example of this type of survey is provided by the Survey of Fertilizer 
Practice, carried out in various counties of England and Wales from 1942 
onwards (Yates et ah , 1944, G). The objects of this survey are to determine 
the way in which farmers manure the different crops, and the relation of this 
manurial practice to the fertilizer requirements of the soil, in so far as these 
can be determined by the current methods of chemical soil analysis. 

The method of selecting the samples is as follows. For each county a 
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systematic sample of farms is selected from the Ministry of Agriculture’s 
addressograph list, maintained for the purpose of collecting the agricultural 
statistics on crop acreages and livestock. This list is arranged alphabetically 
by farmer’s name and parish, and shows the total acreages (crops and grass) 
of each farm. A variable sampling fraction is used, with three size-groups, 
about 100 farms being selected from an average county. Larger samples 
are taken from counties which can be subdivided into districts containing 
different types of farming. 

, Each selected farm is visited by a field investigator, who is a member of 
the Provincial Advisory Staff. All the fields of the farm are listed in consultation 
with the farmer according to their crops, and also according to whether they 
have been recently ploughed out from grass (new and old arable). In the 
earlier surveys one field of each crop was selected at random from all the 
old-arable fields, and similarly for all the new-arable fields. One permanent 
grass field was also selected. In the later surveys one field in three of each 
crop has been selected from each of these categories. From each group of 
selected fields one old-arable, one new-arable and one permanent grass field 
is selected at random, and soil samples taken for chemical analysis. 

For the selected fields information is obtained from the farmer on the 
cropping over the previous four years, and the amounts and chemical composi¬ 
tion of the fertilizers, farmyard manure and lime applied in each year of this 
period. In some of the later surveys only a single year has been covered. When 
necessary the fertilizer merchants are consulted in order to obtain information 
on the chemical composition of the fertilizers. 

The methods of analysis adopted in this survey are illustrated in Example 
6.19. 

4.24 Use of maps as frames in agricultural surveys 

If accurate large-scale maps showing the field boundaries are available, 
the point method of sampling is very suitable for crop surveys in which contact 
with the farmer is not necessary. The fields will then act as sampling units, 
and selection will be with probability proportional to size. Provided the whole 
of a selected field is under a single crop, all that is necessary for acreage estimates 
is to ascertain the crop, no determination of are^ being required (Section 3.9). 
If more than one crop is being grown on a selected field, the proportions of 
the area under the different crops must be determined, but eye estimates will 
usually be adequate for this purpose. 

In this type of work two-stage sampling will often be advisable in order 
to save travelling, and also to avoid having to handle an excessive number of 
maps. Thus in the United Kingdom the 6-inch Ordnance Survey quarter- 
sheets (3 miles X 2 miles) might provide suitable first-stage units, a fairly 
dense grid of points being taken over the selected sheets. 

If selection with equal probability of irregularly-shaped areas such as 
fields is required, these areas must each be defined by a single point, such 
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of ascertainable reliability. By the procedure of first surveying a properly 
selected quarter of the whole sample, it was possible to obtain the preliminary 
estimates in the required time in spite of unexpected delays in the commence¬ 
ment of the survey. 

4.26 Frames for undeveloped areas 

If no accurate maps are available, exact location of previously demarcated 
stfiall sample areas on the ground will be impossible. Alternative methods 
must therefore be employed. 

For completely undeveloped areas such as natural forests the line method 
of sampling is very suitable, provided the terrain and vegetation is such that 
the lines can be followed on given compass bearings without an undue amount 
of deviation. Distances along the lines can be determined by some simple 
measuring device such as a rope, or even by pacing. Where volume measure¬ 
ments are required small areas can be demarcated at given distances along 
each line. 

Some frame for the location of the lines is necessary.. This can often be 
provided by existing mapped roads or other tracks, but it is by no means im¬ 
possible to construct a secondary frame as the survey proceeds by the use 
of cross traverses, using any available tie-in points. Except where maps are 
to be constructed, no great accuracy in the location of the lines is required, 
since it is only necessary that they be located in an unbiased manner with a 
density which is the same for the different parts of the area, or, if not the same, 
is determinable. 

In areas in which a line on a fixed bearing cannot be followed, any attempt 
at complete and unbiased coverage must necessarily be very expensive. 
Often, however, a sufficiently unbiased sample of natural vegetation will be 
obtained by traversing existing tracks and taking sample areas at suitable 
intervals by offsets at right angles to the tracks. If a map of these tracks is not 
previously available it may be worth constructing one by rope and sound or 
similar rough surveying technique. 

Crop surveys in partially developed areas without adequate maps present 
somewhat different problems. If the cultivated areas are located in the neigh¬ 
bourhood of villages, a two-stage sampling process will probably be required, 
a sample of villages being taken at the first stage. Since the total area of 
cultivated land associated with a village is likely to be closely correlated with 
the population figures, these (if known) should be treated as supplementary 
information. If not known the feasibility of making a simultaneous population 
census should be considered, since information on cultivated areas will be 
of more value if it can be related to population figures. In this case the sampling 
may well be two-phase, a larger sample being taken for the determination 
of population. 

The survey of the cultivated areas associated with the selected villages 
will require the construction of second-stage frames. If the line or point 
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method of sampling is practicable this is likely to be the simplest method of 
dealing with compact areas of cultivation. Outlying fields will in this case 
have to be enumerated and sampled separately. 

In many cases enumeration of all fields will be the only practicable method. 
The preparation of a sketch map will then be advisable. A certain percentage 
of the enumerated fields can be measured for area, and the crops determined 
if this has not been possible at the mapping stage. If the cropping is known, 
stratification by crop should be made before the selection of the sample for 
area measurements. A frame of this kind may remain serviceable, with some 
revision, over a number of years. It will also serve to locate the samples 
required in a crop estimation scheme. 

4.27 Use of aerial survey photographs 

When no maps are available the possibility of using aerial survey photo¬ 
graphs as a frame for agricultural and land utilization surveys should be borne 
in mind. Although it is unlikely to be practicable to make an aerial survey 
simply for the purpose of providing a frame for sample surveys, it is often 
possible to utilize a survey that has been undertaken or is contemplated for 
other purposes. 

Any aerial photographs covering the area are likely to provide an adequate 
frame, though the use of aerial survey photographs even for a frame is not 
as simple as it appears at first sight. The mere handling of the photographs 
covering any large area is a somewhat difficult task which demands an adequate 
and properly trained office staff. Moreover, aerial photographs are subject 
to variations of scale (and also distortion) due to tilt and changes of altitude 
of the aircraft. The stated scale is therefore not always correct, and the scale 
sometimes exhibits disconcerting variations even over different parts of the 
same mosaic. The precaution should therefore be taken of checking the scale 
by means of measurements on the ground in a sufficient number of instances 
to make certain that no important source of error is introduced. 

Various methods of sampling can be used in conjunction with aerial 
photographs. If crop acreages have to be determined, point sampling is 
suitable. After the points have been marked on the photographs the fields 
in which these points fall must be identified on the ground and the crops 
growing on them recorded. In order to avoid excessive travelling it will almost 
certainly be worth using a two-stage process, the units at the first stage being 
rectangular areas which can be demarcated on the photographs, with a number 
of points taken within each of the selected areas. 

If line sampling is required, the lines can first be demarcated on the 
photographs, and subsequently surveyed on the ground. In certain 
circumstances it may be possible to make the intercept measurements on the 
photographs, using the ground survey merely to determine the characteristics 
of the various intercepts. 


86 



PROBLEMS ARISING AT THE PLANNING STAGE SECT. 4.28 

If areas such as fields, the boundaries of which are recognizable on the 
photographs, are to constitute the sampling units, they may be selected with 
probabilities proportional to their sizes by point sampling. If there are likely 
to be ambiguities in the definition of the boundaries the units should be 
demarcated before selection. 

If natural units such as farmhouses which depend on point locations are 
to be selected, small rectangular or circular areas may be used as sampling 
units in the same manner as in selection from a map. 

In certain cases aerial photographs may provide the necessary information 
without any ground survey work. It is usually possible, for example, to 
recognize cultivated areas on the photographs, and the total cultivated area 
may consequently be determined directly from the photographs. In certain 
cases it may even be possible to differentiate between the different crops. In 
these cases the total cultivated area and the proportions of the area under the 
different crops can be determined by sampling of the photographs, point or 
line sampling being used as convenient. If desired, adjustments for variations 
in scale can be made by varying the spacing of the points or lines. 

In some cases the differentiation between the different crops on the 
photographs may be only partial, or subject to error. In such cases a sub-sample 
of the points classified on the photographs can be re-classified by ground 
survey. The information provided by the photographic classification will then 
serve as supplementary information. By this procedure the amount of ground 
survey necessary may be very considerably reduced. The examination of 
stereo-pairs may be a considerable aid to the classification of certain types of 
area, particularly forest areas. 

If an aerial survey is specially undertaken for the purpose of a sample 
census or survey, it is possible to reduce the amount of photography by taking 
parallel strips of photographs separated by unphotographed areas, but aerial 
photographs taken in this manner will not be of much use for mapping purposes. 
If no map frame is available, a few cross-strips will have to be taken to provide 
links between the separate strips. Too much reliance must not be placed on 
the accuracy of the location of the strips unless special navigational aids are 
installed. 

4.28 Crop estimation 

The total yield of a crop can be regarded as the product of its acreage and 
the mean yield per acre. These two quantities may therefore be estimated 
separately. Estimates and forecasts of the mean yield per acre must of course 
be related to the conventions adopted in the estimation of acreage, particularly 
with regard to areas on which the crop has failed or been abandoned. 

The estimation of acreage has already been discussed in the preceding 
sections, and in this section we shall therefore mainly be concerned with the 
problem of the estimation of the mean yield per acre. 

There are a number of ways in which estimates of the mean yield per acre 
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of a crop, or the total yield, may be obtained. These may be broadly classified 
as follows :— 

(1) Reports from crop-reporters, who, at or subsequent to harvest, make 
returns to a central authority of their estimates of the average yields 
of the crop in their own districts, these estimates being based in the 
main on general impressions, discussions with farmers, etc. 

(2) The harvesting of small sample areas of the crop immediately prior 
to the main harvest. 

(3) Eye estimates of the yields of a sample of fields, with subsequent 
calibration of these eye estimates by comparison with the actual yields 
of some at least of the sample fields. 

(4) Co-operation with the farmers at harvest time so that accurate yield 
figures may be obtained from a sample of fields as they are harvested. 

(5) Returns by farmers of the yields of their crops. 

(6) Market returns, export statistics, etc. 

If necessary, Methods (2) and (3) may be combined in a two-phase sampling 
scheme, eye estimates being taken from a comparatively large sample of fields, 
with crop-cutting samples from a smaller sub-sample of these fields. 

These various methods all have their advantages and disadvantages. 
Method (1), that of crop-reporters, is the one commonly adopted by countries 
with long-established and stable systems of agriculture. Its success depends 
on the ability of the individual crop-reporters to make accurate and unbiased 
estimates of the average* yields of their districts. The method is not objective, 
and no assessment of its accuracy can be made unless independent estimates, 
provided by some other method of known or ascertainable accuracy, are 
available for comparison. Doubt is often cast on estimates provided by the 
method because of disagreement with market returns, etc., and their lack of 
objectivity makes it impossible to say which set of estimates is at fault. 

Even if crop-reporters are reasonably accurate on the average over a run 
of years, estimates for particular years or particular districts may be subject 
to considerable errors. There seems to be a general tendency, for instance, 
to underestimate yields in good years and overestimate them in bad years. 
The accuracy attained may also be very different for the different crops. 
Moreover, spurious long-term trends may be introduced through gradual 
changes in the standards of "the reporters, and this considerably reduces the 
value of the estimates as a measure of the improvement or deterioration of the 
agriculture of a country. Any sudden change in an agricultural system, such 
as the introduction of new varieties, or the bringing into cultivation of new 
land, may introduce disturbances into previously satisfactory estimates. 

Method (2), the harvesting of small sample areas, is theoretically capable of 
providing a completely objective estimate of the mean yield per acre of the 
standing crop at harvest time ‘; it will not, of itself, provide any estimate of 
the losses at or subsequent to harvest. In practice, however, serious bias 
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may arise in a number of ways if proper precautions are not taken. These 
sources of bias, and the practical details of the method, are discussed further 
in the next section. 

Method (3), that of eye estimates, has the advantage that on certain types 
of crop such estimates can be relatively rapidly made, and consequently a larger 
sample of fields can be visited in a given time. The difficulty of having 6 to 
transport and thresh a large number of samples, which often arises with 
Method (2) is also avoided. Some results of a trial of this method on wheat 
are given in Example 6.15. The method is not suitable for root crops such 
as sugar beet and potatoes, since it is difficult to judge the yields from inspection 
ot the tops. In such crops, however, there are no transport and threshing 
problems, since the samples can be weighed in the field. 

If calibration of the eye estimates from the farmers’ yields is not practicable 
or if the calibration is found to vary substantially from year to year a few 
specially-trained field workers can be used to take crop-cutting samples in 
° rd ^ t0 i. Ca , lb /? te th , e „ eye estimates of each investigator at the time of harvest. 
,.^ Met . hods (f) ? nd ( 5 ) require the co-operation of the farmer. Method ( 4 ) 
differs from Method (5) in that in Method (4) the harvesting is done in the 
presence of an investigator, and if necessary with assistance, such as the 
provision of a threshing machine, whereas in Method (5) reliance is placed 
entirely on the farmer to provide accurate yield figures. Owing to delays of 

threshing, etc., Method (5) is not likely to provide estimates till some time 
atter harvest. 


Estimates from market returns, export statistics, etc. (Method 6) provide 
a useful basis for comparison with estimates by other methods, but such returns 
will only exceptionally give an accurate estimate of the actual yields since 
the amount of the crop passing through the market is likely to vary very 
considerably in different circumstances. 

In Methods (2) and (3), which require field investigators, the question must 
be considered whether the survey should cover the whole of the country or 
whether it should be confined to certain districts only, using a two-stage 
sampling process. If only an estimate of the yield of the whole country or 
ot large districts is required, comparatively few fields will need to be sampled 
and a single-stage process for the selection of fields will result in a very 
dispersed sample. A two-stage process will avoid this difficulty at the cost of 
introducing a between-districts component of variation into the sampling error 

f inally, it should be emphasized that crop estimation, though theoretically 
simple, presents many practical difficulties. The introduction of a satisfactory 
scheme where none exists, or the provision of objective estimates to check 
existing subjective estimates, requires continuous work over a number of years 
by a properly established team of workers. Except for preliminary investigations 
crop-estimation projects should therefore not be undertaken unless continuity 
can be maintained. Nor should an existing method of estimation be abandoned 
or disturbed until a better alternative has been evolved and kept in operation 
tor some time. If the old and new methods are run in parallel for a number 
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of years it will be possible to assess the reliability of the old method and its 
degree of bias, a task which will be quite impossible if it is discontinued before 
adequate comparative data have been obtained. 

* 

4.29 Estimation of yield by the harvesting of sample areas 

As mentioned in the previous section, the estimation of the mean yield 
per acre of an agricultural crop by the harvesting of small sample areas presents 
many practical difficulties, and the results may be biased m a number of ways 
if the proper precautions are not taken. t 

Errors can occur through faulty selection of the fields, through faulty 
sampling of the selected fields, through failure to take samples from the fields 
at dates sufficiently near harvest, or through failure to sample some of t e 
selected fields owing to their having been harvested before they we 

V1S1 If d ’rigorous means of selection are employed there is no reason why the 
selection of the fields should be faulty. If, however, the cruise method is used 
fields being taken at equal distances along a given route in the manner described 
in Section 3.15, the estimate will almost certainly be appreciably biased, 
though this bias may be reasonably constant from year to year if the same 
route is followed each year. On the other hand, the use of the cruise method 
overcomes the difficulty of ensuring that the visits to the fields are made 
sufficiently near harvest, and also eliminates the risk of missing fields throug 
their having already been harvested. All that is necessary is to traverse the 
route at sufficiently close intervals of time, stopping the car at each sample 
point and examining the crop to see if it has reached a sufficiently mature stage 

for a sample to be taken. ., , 

An alternative procedure which is sometimes used is to follow the prescribed 
route and take a sample from all or a given fraction of the fields that are actually 
being harvested. This, however, may introduce an additional component of 
bias since, unless special precautions are taken limitations of time will result 
in the inclusion of a greater proportion of the fields which are harvested very 

With crops that do not have to be fully mature at harvest, e.g. potatoes, 
samples will normally have to be taken somewhat before maturity urdess 
information is available from the farmers as to when they intend to-lift. Wrfh 
such crops, however, it is usually possible to estimate the amount of gr 
between the time of taking the sample and date of harvest ; this latter date 
can then be determined by a subsequent visit. In the potato crop, for examp , 
investigation has shown that the weight of tops provides a fair indication of 
the amount of further growth that may be expected. 

The cruise method of sampling, therefore, provides a method of crop 
estimation which, though theoretically more liable to bias than a P ro P e ”“ 0 ^ 
selection of fields, may in practice give more satisfactory results, particularly 
in the estimation of yields per acre. It is also likely to be considerably more 
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«con° mic al in travel Which method is most suitable will depend largely on 
local conditions, and must be the subject of local investigation. g Y 

Bias in the estimation of the yields of the actual fields can arise from 
mproperlocatmn 0 f the samples and from cutting a larger area of the crop 

SonVT uml ”*• A “ “ m ' ,k of ” ch bi “ h “ be™ given”" 

Edge effects are also liable to give rise to bias, since in an irregularly shaoed 
field it is impossible without a great deal of labour to locate samples properly 
at random over the whole of the area. The method described in Exempli 3 2 b 
is clearly impracticable, and no simple method of traversing the fieldhas been 

In V nr d Whl t W1 ^ Cqual probabilit y of selection over the whole field 
In practice however, a systematic method of selecting the sample is quite 

?2"2*c ,hing is “ s “ lh “ a » ° f ■>» 

a The d ® ter ™“ation of the bias arising from headlands, lower yields at the 
edges of the field, and errors in estimation of the area of the field—the U K 
Ordnance Survey areas for example, include farm roads, hedges and ditches^ 
n be made if required by more rigorous supplementary observations on a 
small percentage of the fields. Often, however?die separate determination 0 f 
these components of bias is of no great practical interest, since the losses at 
^,t erb - est wlIlaIso affect the total amount of the crop that is finally 
?Sw f ° r C0n , sumptl0n ’ and the total bias is best determined^by comparison 

T merS reported yields on a sub-sample of the fields, or by determining 
these yields m co-operation with the fanner. ^ 

Two methods of locating the sample units have been found convenient 

corEerT 6 “ ^ C ° Untry - The first is to traverse the field diagonally from 
corner to corner using one or both diagonals, and locating samples J equal 

intervals along these diagonal, lines. The interval required can be calculated 
y pacing the diagonal or making an eye estimate of the number of paces 
Errors in the eye estimates are of little consequence, since the exact number 

ci S b^ mg U T. 1S 1I T aterial - Alternatively, if the crop is in rows the field 
can be traversed along the rows. The length of one end is paced from cornel 2 
to corner B, the field being entered at a distance of one-quarter of this length 

fhdT IT" \ 11 18 then 1 tlaVerSed don S this row to the other end of the 

field, and a return row is selected by the same procedure. A suitable number 

sampling units is taken at each traverse in the same manner as in the case 
of a diagonal traverse. In this method of sampling it is advisable to step 
lately across * ^ SmaU number of rows after each sampling unit has 
been taken, since a given row may fall wholly on a particularly good or bad 
strip of the field, e.g. on a ploughman’s “ land.” 7 g 

Whatever the exact method of traversing the field, it is of the utmost 
importance that the location of the rows and of the sampling units should 
be made without inspection of the crop in the neighbourhood. 8 This can be 
done quite effecvely by counting p,^ mcl dig | ng ^ the 
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requisite number of paces have been taken, but the field workers must be 

,h0 A U |ood ™* h L P be» done on the most suitable sixe and shape 

the sampling units. In this country experimental tests have shown that 
£££ quarter-metre rowlengths are »i«.ble ^cereal 

T? large-scale, 

nature ot the crop a ^ rQWS or broadc ast, variability within fields 

available iuipment (o, threshing and transport, etc. Local inve.uga t.on s hodd 
therefore always be undertaken if any extensive work is contemplate . 

provided bias is avoided. (See Section 8.12.) 

4.30 Crop forecasting 

The term forecast is here used to denote an estimate of the yield of a crop 

furltd a, L. date -Q J* term i. — 

indicate estimates made by P P r „ bb „ in the light of information 

™d fmm“,me“ Sncjes.l.es, however, are better termed pr«<-»«-T 

of the mean yield per acre. forecasting. Forecasts can be 

There are three main methods ot crop torecasung , . , 

observations and physical measurements of th g g P>. 

xrAtr 5 c t"” rr ? <*— 
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meas Urements on t h e growing crop are tQ be uged The evaluadon of these 

datums requires the application of the method of statistical analysis known 

be welf? SS10n r yS1S u We Sha11 n0t describe this method here > but it may 
em P h f 1Ze that lts application is not entirely simple, and the advice 
be slight ^ statistician experienced in this type of work should therefore 

whilh" 1 ^ bC aSS “f Cd Aat k WiH be P° ssible t0 evolve a prediction formula 
Ze lTt S f Sf f t0ry , rCSultS ’ even if accurate and extensive data 

meteor 7 ^ 1 In , tbe first P la ce, the yield of a given crop is influenced by 
meteorological and other events up to and sometimes after harvest, and this 

mnLr7J UC V 00 grCat V iegree of uncertainty into yields predicted some 

Emhlem 7 eSt V 7“ the P f diction of an y va lue. In the second place, 
although meteorological factors undoubtedly account for a good deal of the 

variation in crop yields, they are not by any means the only factors. Changes 

in variety, insect pests, plant diseases, exhaustion of the fertility of the soil 

and manv^ther 77 ° f and under cr PP- changes in the amount, of fertilizers,’ 
7 7 f 8 m f y a S ° CXert 3 major infl uence. Thirdly, meteorological 
dpT- /° meWhat com P lex - and it may therefore be impossible to 

• nC tbem fr ? m a Se l ° f data extendin g over a limited number of years ; 
mimhpr f 7 e t S, ? l arity ° f weather conditions over large areas, data from a 

oveT a rf '"S °” ^ *" °” ly a Par “ SUbSti,UK f ° r da,a 

w/V,^ 116 advantage °f using measurements of crop grdwth instead of reiving 
holiy on meteorological observations is that the crop is thereby used as its 
own integrator of meteorological and other effects up to the time of the 

onr^b^T' Fr ° St and , fl °t° d damage ’ for instance, are clearly better assessed, 
rneteomW- ha , Ve occurred ; b y purvey of the crop than by examination of 
meteorological records. The selection of the particular types of observations 

J d 7!l S i7 mentS u V e7 bkely t0 give an ade fluate basis for forecasting 
f ° n Wh , 1Ch e Ur el ' SC1Cntific research is squired, particularly in 

has alread f ^ Cr ° pS ’ In the case of root crops investigation 

already shown that the amount of growth made by the tubers or roots 

77 wel I /!” measure of the extent to which the plant is still growing, 
weight of tops, are likely to give satisfactory results. 

* l7 n ?7 e e J°! Uti ° n ° f , a satisfactor y meth o d of crop forecasting demands 
,sm7hl! 7°i 7 Ual yidds ° Ver 3 P eriod of years, an investigation of 
Once th^ 6 ! ° S 7 bC C ° mb ’ ned Wlth an objective crop-estimation scheme, 
usefnl h f , tV ' a ? nS a f d pby sical measurements which are likely to give 
useful informat/on have been decided, all that is necessary is to tike these 

fhr s77r S 7 3 SUb ' S T pk ° f the fields which wiU subsequently be selected 
for sampling at harvest. In the initial stages it may be better to carry out the 

observations on a special sample of fields, rather than on the more Latter^ 
sample which will be suitable for crop estimation proper. More intensive 
investigations can also be carried out on experimental plots on which different 
varieties are sown, and which are subject to different cultural treatments and 
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sowing dates. Experimental plots by themselves however, arenotlikelyto 
provide all the information required for the evolution of a suitable ^casting 
scheme, since the variation from field to field in a given district is often quite 
large, and the inclusion of a number of fields in the district usua y g^es 
much more adequate representation of the average meteorological effects m 
that district than will a single field. 

If observations and measurements are to be made on the growing crop, 
a sampling scheme will have to be devised in order that single plants ° r 
areas may be selected for measurement. A method of selecting wheat shoot 
for height measurements, for example, is described m Section . . 

principles to be followed in the location of the sampling units ,n *e field, 
or experimental plots am similar to those which operate in the selection of 

samples for yield estimates. 


4.31 Determination of the size of sample when the sample is fully 
random 

As has been indicated in Chapter 3, the size of sample required to achieve 
a given accuracy depends on the variability of the material and the extent 
to which it is possible to eliminate the different components of this variability 
from the sampling error. In this and the following section we will describe 
the procedure 9 which is appropriate for determining the size of a random 
sample and indicate the general relationship between the errors of a random 
sample and other types of sample. Detailed consideration of the more involved 
types of sampling must be deferred till Chapter 8, where the comparative 
accuracy of the various types of sampling is discussed. 

In the discussion of sample size we shall require the concept of standard 
error As already explained in Section 3.7, the sampling standard error of 
an estimate is a measure of the average magnitude of the random samphng 
error to be expected in that estimate. It also provides an indication of the 
frequency will?which errors of various magnitudes may be expected to occur 
(Section 7 3) In rough general terms, one-third of the actual sampling errors 
.mTgieater than the smndard error, and one-twentieth will be greater than 

twice the standard error. 

In the case of a-fully random sample from a large population the formula 
for the standard error of the estimate of the proportion of units of a given 
type i e. having a given attribute , is very simple. If p is the proportion o 
units of the given type in the whole population, and q = 1 - p is the proportw® 
not of the given type, the standard error of the proportion of units of the 
given type in a random sample of n units (which provides an estimate p of p) is 

given by 


standard error of p 
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S„r p P ':“,s; isTS« error of 5 per cent in ^ «™*' <*«-«■»«» 

32-3 2 

n = —= 42 

A similai^ calculation number re( I uir ed would be 1050. 

Hertfordshire S™ (Sle , 2 ? Il'7 ° f ^ “ d <»» «t 

Ex-up!. 7.2.b. '"Tb'” 0 " "i.” giT “ “ 

per farm is 18-6 a nr! V 3b 8. The mean wheat acreage 

100 X 36.8/18-6 - 198 qi P ercenta g e standard deviation is therefore 

because there^re ~a large numbei^cYfkrms 6 ?** “ here ^ la rge 

to determine the total wheat acreage of 0rder 

W m S3mpIe ° f farmS ’ - Si XS 

derived from random BamplJ afdifSnt^J^taS'S tST™ ° f 
are inversely proportional to the square roots of the numbersTtheT ^^ 
Conversely, to reduce the standard errors of the rlnH h amp,eS ' 

require to increase the size of the sample bv the <1 I ^ ^ We 

in order to halve the standard errors Ttt 7 ? SqUare ° f the ratio - Thus, 

of the sample by 4. f he reSuilts We must multiply the size 

4.32 Some general rules on size of sample 
Of WiU be “ e " tte th ' ot .1.= size 

whe„ ;,, md o!„ ,s .* re “ re| y s ™pi« matter 

the calculations are m e omolwa J . m ° r6 inVOlved of sampling 
that is being sampled P ^ and must be known of the material 

is, h^vltofen l h usS;r r nrelim hiCh ^ by a random «mple 

required in the more involved typesoY sampling ^if ° f Sample IikeIy t0 be 

unit is under consideration, the reduction Yi nnmh ° nly f ° ne .h v P e of sampling 
the more complicated types of samnlino i« A ’ ber * ° f units required with 
total variability which L removed bY?L determmed by the fraction of the 
stratification or by the use of sunnlem e lmpos ‘ tlon of restrictions such as 
possible to form a rough idea of theYkeY 1 1 ? for ” latlon - I<: is frequently 
of the characteristics of the material. ^Th^^T 011 a . gen f aI knowled ge 
crop acreages, using farms as samnlino „ n 'i •* ■ s ^ rve y designed to determine 
by size of farm and th^use o^varifbT ’ L t0 £ “P 66 ^ that Ratification 
such stratification will each give conside W flactlon m conjunction with 

.amplCe This is confirmed" by the 

consideration^he^situafion^s^ mm^com*? °t f aItCmative sizes are under 

already presented in Section 3 11 ThhY 'T 6 ’ | 3S 1S sbown by the results 
n section This ,s true also of multi-stage sampling. 
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The following general rules may be of value. Rules 1 to 5 are applicable 
to the case in which only one type of sampling unit is under consideration, 
rules 6 and 7 to the case in which more than one type of sampling unit is being 

considered. 

(1) The use of stratification, a variable sampling fraction or supplem"y 
information may in general be expected to increase the accuracy. 
Consequently the calculation of the number of units 

case of a random sample gives an upper unit to the JLfing 

required in any reasonable form of sampling using the same sampling 

units. , £ . 

(2) Stratification will only increase the accuracy substantially d there are 

marked differences between the different stiata. J characters 

usually larger for quantitative characters than for qu > 

i.e . attributes (Table 3.7.b). , 

(3) A variable sampling fraction can greatly increase ^ 

units vary greatly in size, or more generally in vanab ' llt > f ™™ ( 2 a ^ ve 
to stratum. Fractions which increase the accura^ for quantitative 
characters may reduce it for qualitative characters (Table o.« ■ )• 

(41 The use of supplementary information can greatly increase the accuracy 
( } in appropriatenesses, and often serves as an alternative to stratification 

(Table 3.7.b). . , 

(51 Since there must be at least one unit per stratum, more e ai 
stratification is possible with larger samples. In such ^cumstan^^ 
the increase in accuracy with increasing size of samp 
™pid is indicated b, ,h. squar.-coo. law. Conv^f- samples 
of a given accuracy the advantage of stratification may be^ccdbythe 
fact that reduction in the size of the sample necessitates an increase 
in the size of the strata (Section 8.15). 

(61 If sampling units of type A consist of aggregates of sampling units ot 
1 b ! households and individuals), the use of sampling units of 
Zl l in place of unto of type B will usu^ly ,-i. ,n fow« 
for a given amount of material in the sample (Table 3.11 .b and hxample 

(7) Ifmulfi-stage sampling is used, more final-stage unite> will be' Squired 
^ ' than will be the case with single-stage sampling of the final-stage un 
(Tables 3.7.b and 3.11.b). 

All the above rules are indicative only. The quantitative gains in accuracy 
or reduction in number of units required in any particular case muS ^ V ^ t of 
bv the methods described in Chapter 8. The final decision as to the type ot 
sampling to be adopted necessarily depends on the relative accuracy 

various methods and their relative costs. 

It is advisable at the planning stage to consider as far as p . urvevs 

in which the results require to be presented. In more complicated surveys, 
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to supervise the field work. Often an organization can be employed, which 
already has contact with the respondents, or which has on its staff individuals 
who are suitably qualified to act as field investigators. 

5.3 Design of forms 

Most careful attention should be given to the detailed design of the various 
forms that will be used in the course of the census or survey, especially the 
forms on which the observations and answers to questions are recorded. This 
applies also to the instructions and explanatory notes which accompany the 
forms. 

The content of the forms for the recording of the information is determined 
by the information that is required, and has already been discussed. They 
may be forms designed for completion by the recipients with little or no 
assistance, questionnaires which form the basis of interviews, or forms on 
which observations and measurements taken in the field are recorded^ by the 
field investigators. 

Each type of form presents its own difficulties of design. The simplest 
is that on which observations and physical measurements are recorded by the 
field investigators themselves. In this case, the chief points to observe are 
that the form is convenient to use, and that the results are set out in such a 
manner that they are convenient to abstract. Figures which have to be 
summed by the field investigators, for example, should be arranged vertically 
and not horizontally, as the investigators will not be using calculating machines. 

In surveys which involve observations and physical measurements it will 
almost always be necessary to supply field investigators with a separate set of 
instructions. Consequently there is no need for the form to carry its own full 
explanation, though it should of course be made as self-explanatory as possible. 
Experience has shown that instructions to field investigators should be very 
detailed, and should cover all possible points of uncertainty or ambiguity. 
Provision should also be made for revision and amendment as need arises, 
since it is extremely difficult to draw up a set of instructions which are completely 
unambiguous and deal with all possible contingencies. 

In forms of the census type, designed for completion by the recipients 
without assistance, very careful attention must be paid to the exact wording 
both of the questions and explanatory notes, so that there is no doubt in the 
mind of the recipient as to what is required. Detailed and lengthy explanations 
should be avoided as far as possible. Such explanations as have to be given 
should if possible appear in conjunction with the question to which they refer. 
The common practice of giving detailed explanatory notes on the back of a 
form is not very satisfactory, since it frequently results in the respondent 
filling in the whole or portions of the form without consulting these notes. 
Forms of this type should, if possible, carry a brief explanation of the reasons 
for the census. Even if this has been given in the press and elsewhere it is 
unlikely that all recipients will in fact have seen it. ;; 
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In forms of the questionnaire type designed for completion by field 
investigators the investigators must be instructed whether the questions are 
to be put in the exact form given, or whether they can be asked in a general 

orm. As already stated, in most cases the general form is more suitable but 
m questions on opinions, where different forms of wording may be expected 
to affect the answer, it may be necessary to adhere to an exact form. 

With the general form of question explanatory notes are often required 
m order to make clear to the investigators exactly what information is required, 
buch explanatory notes can either appear on the questionnaire itself or be given 
in a separate set of instructions. The latter course results in a much more 
compact form of questionnaire and is suitable when full-time investigators are 
used. The former course is more likely to ensure that all investigators are in 
fact aware of what is really required and is best when the investigators are 
carrying out the survey in the course of other duties. In a lengthy questionnaire 
this will necessitate the questionnaire being in the form of a booklet. Such 
questionnaires are more bulky and costly, and frequently entail more work 
m the coding of the results, but are nevertheless frequently preferable in these 
circumstances. 

Forms may be either printed or duplicated. Printing is much to be preferred 
as it results in much neater, clearer, and more compact forms. The ordinary 
type of duplicating paper is also not very suitable for writing on, particularly 

Small forms may be printed on cards instead of paper. Cards are often 
more convenient for field use, and in small surveys of which the results are 
analysed by hand the use of cards may save transcription before analysis. 
Alternatively Cope-Chat cards may be used (Section 5.10). 

Forms printed on paper may be made up in the form of blocks with card¬ 
board backs. This facilitates writing in the field. Alternatively they can be 
clipped on to a wooden board. If duplicate copies of the completed forms 
are required, provision should be made for carbon copies to be taken at the 
time the forms are filled in. 

Forms larger than foolscap should be avoided if possible. They are 
troublesome both to handle and to store. Forms of more than one sheet should 
also be avoided. It is usually better to use both sides of a sheet or card than 
to use two sheets. y 

Forms should always be subjected to a preliminary trial in the field. Only 
m this way will minor faults be discovered. In the case of questionnaires this 
test is best arranged in two parts : 

{a) a trial by investigators who are fully experienced in questionnaire 
work, and who are conversant with the problems under investigation ; 

(b) a trial by investigators of the type that are to be employed in the 
survey. 

The first trial will serve to determine whether the questionnaire is in the 
orm most suitable for eliciting the required information from the respondents, 
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exercised, or they may be individuals asked to undertake the work on a voluntary 

Kocio or for a small honorarium. * 

The problem of selection arises primarily in the case of investigates 
appointee? specially for the work. In order to secure a suitable type of person 
preliminary P tests should if possible be made of all applicants and ije ®ary 
work of newly appointed investigators should be carefully watched and super- 
“ In large-scale censuses and surveys, proper training courses should 
be arranged. If a pilot survey is undertaken this provides a valuable 
opportunity for training, and every attempt should be ma eto U1 ® 

team of investigators at this stage rather than later, even 1 
certain amount of additional expense. 

It is of the greatest importance that investigators, once they have b e 
trained and are found suitable, remain in the job. Every effort mus ere 
be made to see that the pay is adequate, and that the work is made as attractive 
as h“e c”„f die interview type of survey, iuvesuga.or, are 

sometimes paid on piece rates at so much a completed questionnaire. This 
is Tn general unsatisfactory, since it tends to lead to skimped work and to 
irregularities such as substitution of one respondent for another. 

ft should not be forgotten that field work of the interview type is very 
arduous and is found by almost all investigators to involve considerable ment 
strain Hours of work are also likely to be irregular, since if excessive 
non-response is to be avoided some evening interviews are almost inevitable. 
Investigators should therefore not be expected to work excessive y bng hours 
and should if possible be given a rest on other work from time to time. It is 
often advantageous to bring full-time investigators to headquarters at intervals 
and use them for office work such as abstraction and analysis of the results. 
This not only serves to provide a break from field work, but also enables them 
to gain a much better insight into the purposes of their work. 

Whatever the conditions of work and form of payment, there must be 
adequatTfield supervision. The supervisors should themselves undertake 
field work from time to time, so that they are in a position to appreciate the 
difficulties of the work, and should also contact the workers while they a e 
actually in the field. Provision should be made for personal contacts not only 
beteen supervisors and the field investigators, but also between supervisors 
and the headquarters staff. In long-term surveys it is also oftenadvartageous 
to arrange conferences of the investigators from time to time at which difficulties 
can be discussed and the whole progress of the survey reviewed. 

5.6 Control of the accuracy of the field work 

The best assurance that the field work shall be accurate is that the 
investigators are thoroughly trained in their work, and are capable, conscientious 
and keen. Nevertheless it is important even with the best investigators 
"keeo a close watch on the progress of the work. . 

In certain cases, particularly in surveys involving observations and physical 
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factors. If punched cards are used, some or all of this work may be carried 
out mechanically, subsequent to the punching of the cards. 

(2) Abstraction and coding of the results so that they are in a form suitable 
for analysis or for transfer to punched cards. 

(3) Punching (in cases where punched cards are used). 

(4) Counts and totals. 

(5) Preparation of the summary tables from these counts and totals, 
including adjustments for supplementary information and any weighting not 
already carried out. 

(6) Calculation of sampling errors and investigations of efficiency. 

(7) Critical analysis. 

Apart from the calculation of sampling errors and investigations of efficiency, 
which are described in Chapters 7 and 8, we do not propose to discuss these 
operations in detail in this book. In the following sections we will merely 
give an outline of the special points that arise at the various stages. 

5.9 Methods of handling the data 

There are four main ways in which the data accumulated in the course 
of a census or survey may be handled. These are: 

(1) An analysis direct from the forms. 

(2) Transference of the data to ordinary cards. 

(3) The use of cards with holes round the edges (Cope-Chat cards). 

(4) The use of Hollerith or Powers-Samas cards (punched cards). 

The primary function of any type of card is to enable the data to be sorted 
into different classes, so that the numbers of units and totals associated with 
these classes can be obtained without transcription. With plain cards the 
sorting has to be done entirely by hand, with Cope-Chat cards marginal 
punching gives some aid to the hand sorting process, while with punched 
cards the sorting is carried out mechanically, and the counts and totals are 
also obtained mechanically. 

In certain cases the data can be recorded directly on cards which are 
subsequently used in the analysis. These may be either ordinary cards or 
Cope-Chat cards. The use of cards in this manner is limited by the fact that 
the amount of uncoded information that can be conveniently recorded on a 
card is small, and also by the fact that cards tend to be damaged by use in 
the field. 

It is also possible to record information directly on Hollerith cards, either 
in a form which enables it to be read by the punch operator as the card is 
punched, or in a form that enables it to be punched automatically by the process 
known as mark sensing. The occasions on which either of these methods has 
any real advantage over the punching of cards from ordinary forms are 
somewhat rare in census and survey work. 

If only a single classification is required, the preparation of a summary 
directly from the forms is likely to be the most economical method of procedure. 
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If more than one classification is required, the use of forms may still be 
reasonably economical, particularly for small surveys, but the possibilities of 
using forms in this manner are limited by the fact that paper forms are not 
easily sorted or counted, and will not stand a great deal of handling.* The 
direct use of forms is also sometimes of value when a rapid preliminary summary 
of the salient features of a survey is required. In a large survey such summaries 
can usually be based on a small sub-sample of the forms. 

If the data are transferred to cards, some form of compression and coding 
is usually necessary. This enables the information to be recorded in compact 
form on the card, and also facilitates the subsequent counts and summation. 
If Cope-Chat cards are used, all information recorded in punched form must 
be coded, and with punched cards the whole of the information has to be 
coded in numerical (or exceptionally alphabetical) form. 

The fact that punched cards have all their information coded in numerical 
form has the disadvantage that the detailed information relating to separate 
units cannot be easily studied by means of the cards themselves. It is also 
difficult to record written remarks on the cards. This tends to make the 
analysis more mechanical. Punched cards are therefore unsuitable for analyses 
which require detailed examination of the whole complex of information 
relating to individual units. Even in surveys which are so large that analysis 
by means of punched cards is essential it is often advisable to arrange that the 
original forms are kept available, so that in any detailed investigational work 
the forms corresponding to selected cards can be extracted and examined 
when required. 

5.10 Cope-Chat cards 

Cope-Chat cards are cards which have a row of holes along each edge. 
A group of these holes can be assigned to each particular classification, e.g.. 
the answers to a specific question, each hole being taken to represent one 
class in this classification. The body of the card (front and back) can be used 
for recording written information. 

By means of a punch similar to an ordinary ticket punch, V-shaped notches, 
can be cut out of the card so as to obliterate any desired holes. If the cards, 
are arranged in a pack and a knitting needle is passed through a particular hole,, 
the cards punched in this hole will fall from the pack when the pack is lifted 
by means of the needle and thoroughly shaken. This enables cards to be sorted 
into different classes with considerably greater speed than would be the case 
if the information were merely recorded on plain cards, and the sorting had to 
be carried out by examination of each card. The Cope-Chat method of sorting 
is not fully reliable, since cards do not always fall out of the pack when it is 
shaken, but mis-sorts can be detected by visual inspection of the edges of 
the retained cards. If all classes of a given classification are coded in some 
mutually exclusive system a positive check will be available. 

* In making counts or calculating totals from forms it is usually best to sort the 
forms into the necessary classes. 
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The amount of information that can be recorded on the edge of the cards 
is limited, since the number of holes is limited by the size of the card. In 
the Survey of Fertilizer Practice, for example, 5 in. X 8 in. cards are, used 
with a hole spacing of just over 4 to the inch, giving 105 holes in all.* 

The punching of Cope-Chat cards is somewhat laborious, and if a mistake 
is made a new card has to be prepared. In this case the written information 
will have to be transferred to the new card. For this reason it is usual to mark 
the holes which have to be punched, and check the markings before actually 
punching. If a considerable amount of punching is to be done, a form of gang 
punch can be used which will punch a particular hole from a number of, cards 
at one operation. In this case the cards can be sorted into the appropriate 
classes and the sorting checked before punching. A key-operated punch is, 
also available. 

When the cards have been sorted they require to be counted by hand. 
If totals of numerical information are required, the summations must be 
performed on an ordinary adding or calculating machine, unless the numerical 
information has itself been coded. The counting of cards is a tedious operation, 
and is made more so by the punching round the edges. For some purposes 
it may be feasible to replace exact counts either by weighing or by measuring 
the aggregate thickness under a definite pressure. Neither method is very 
accurate, however. In a humid climate, for example, the weights tend to vary 
considerably owing to changes in moisture content. 

The use of Cope-Chat cards enables isolated cards having given 
characteristics to be much more readily extracted than is the case when plain 
cards are used. Cope-Chat cards are therefore of value for surveys in which 
units of particular types require to be identified subsequently. In this respect 
they have certain advantages over punched cards, since no elaborate sorting 
mechanism is required and the information concerning the selected units is. 
presented in written form. 

Cope-Chat cards also have the minor advantage that the proportions falling 
in different classes can be roughly observed by sorting the cards and then 
examining the distribution of the notches. 

The coding of numerical information on Cope-Chat cards can be carried 
out in a number of ways. If approximate values only are required the data 
may be grouped into size-groups. If exact values are required the simplest 
method is to allocate ten holes to each digit of the number, but this cap only 
be done if very little numerical information has to be coded, owing to the 
limited capacity of the card. An alternative is to use some form of two-hole 
code to represent each digit. The most compact is that based on four, holes., 
which are taken to denote the digits 1, 2, 4, 7. To code other digits the two 
digits whose sum gives the required digit are punched. Thus the punching of 
1 and 2 represents digit 3. This system is not self-checking on sorts, since 

* In both the Cope-Chat and punched card systems one corner is cut across 
diagonally on all cards so as to provide a check that all cards are right way round 
in the pack. 
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two, one or no holes may be punched. If a fifth hole carrying the value 0 is 
added, every digit can be denoted by a pair of holes, with the convention that 
4, 7 denotes 0. An alternative with five holes, which is simpler, but not self¬ 
checking on sorts, is to use the holes to denote the digits 1, 2, 3, 4, 5, digits 
over 5 being indicated by double punching. 

5.11 Punched cards 

Two different systems are available, known as the Hollerith and Powers- 
Samas. Both systems employ cards in which each column has 12 positions, 
in any one of which a hole may be punched. When re’quired, two or more 
holes may be punched in different positions in the same column. In both 
systems alphabetical information can be dealt with by means of a two-hole 
code. 

Hollerith installations employ 80- or 38-column cards. Powers installations 
employ 65-, 36- or 21-column cards. A given installation will only handle 
cards of one size. By using each column for two items of information, with 
a special form of multiple punching, the Powers 65-column card can be 
extended to give the equivalent of a 130-column card. 

The actual punching of the cards is normally done by a hand-operated 
key punch. Verification, which checks within certain limitations that the 
original punching is correct, is normally performed by means of a hand- 
operated verifier similar in construction to a punch. More elaborate punches 
of various kinds are also available. 

The main difference between the Hollerith and Powers systems is that in 
the Hollerith system the cards are read electrically, whereas in the Powers 
system they are read mechanically. This results in a greater flexibility in the 
Hollerith system, since the machines can be set up for any required operation 
by means of electric connections through one or more plug-boards. If the 
analysis is confined to sorting and counting, the two systems, apart from card 
capacity, have almost identical performance. For the more elaborate types 
of analysis, Hollerith equipment is more suitable than Powers equipment, 
particularly in surveys of moderate size where many different types of machine 
operation, which often cannot be planned in advance, are required on relatively 
small batches of cards. We shall here confine ourselves to a description of the 
Hollerith machines, but it should be emphasized that if Hollerith equipment 
is not available it may be preferable to utilize existing Powers equipment rather 
than send the work elsewhere or use methods not involving punched cards. 

The principal Hollerith machines are :— 

(1) The sorter, 

(2) The sorter-counter, 

(3) The tabulator, 

(4) The reproducing summary punch, 

(5) The multiplying punch, 

(6) The collator. 
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The descriptions which follow are not intended to give a complete account 
of these machines and their various modifications, but only an indication of 
the way in which they work and the simpler types of operation that can be 
undertaken with them. An expert should always be consulted when planning 
any extensive punched-card work. Initial consultations should take place 
before the coding of the material is undertaken. 

5.12 The sorter and sorter-counter 

“ The sor ter can be set to operate on any one column of the card. When 
the cards are passed through the machine they are separated into 12 boxes 
corresponding to the 12 positions of the holes punched in this column, with 
an additional box for cards with no hole in the column. If, therefore, a code 
representing some classification of the material into anything up to 12 classes 
is punched in the column, the cards corresponding to the different classes 
will be sorted into the different boxes. A classification with more than 12 and 
up to 144 classes can be coded on two columns, and by sorting successively 
on each of the two columns separation into all the classes can be effected. 
In the same way, if a group of columns or field is used to denote a nutnber, 
the cards can be arranged in numerical order by sorting first on the units, then 
on the tens, and so on. Equally, if two columns represent two different 
classifications the cards can be sorted into the various cells of the two-way 
classification so formed. If two holes are punched in the same column the card 
is sorted to the higher digit, unless sorting on this digit is suppressed. 

The sorter is normally used for arranging the cards of the pack into groups 
or into a given order prior to their passage through the tabulator. When this 
is done the whole of the cards are kept in one pack, Le* at the end of each sort 
the cards are collected from the separate boxes and the sub-packs are placed 
together in numerical order. 

If only counts are required it is possible to obtain these directly on a sorter 
with a counter device which registers the numbers of holes occupying the 
various positions in the given column. A machine with this device is called 
a sorter-counter. Sorting can be suspended during counting if desired. 

The ordinary sorter-counter counts on a single column only and does not 
print the results. For large-scale census work more elaborate types of sorter- 
counter are available which will count simultaneously on a number of columns, 
printing the results obtained in these counts. 

5.13 The tabulator 

The tabulator is a much more elaborate machine than the sorter. Its 
primary function is to add numbers punched in a given field from a group of 
cards. To effect this the numbers are read successively as the cards pass through 
the machine, being added on one of a set of counters which form part of the 
machine. The machine has a printing device which will print the totals 
accumulated in the counters, and will also, if desired, print numbers read from 
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the cards. The operation of obtaining and printing the totals is called 
tabulation, and that of printing numbers read from the cards is called listing. 

Most tabulators have a number of counters and print banks ; the totals 
of several fields can therefore be accumulated simultaneously. If the numbers 
in the fields concerned are sufficiently small, two or more fields can be 
accumulated in different parts of the same counter, thereby further increasing 

the capacity of the machine. . 

In order to enable the totals of the groups of cards in different classes to 
be obtained successively without stopping the machine and without having 
to feed in the groups of cards separately, a device known as the control is 
incorporated in the machine. This device is such that when wired to control 
on a given column, the machine will break control if the card following the 
one that is being added carries a different designation on the control column. 
This break of control stops the adding process and gives the machine certain 
instructions as to printing and clearing, e.g. it can be wired so that the total 
already obtained is printed and cleared before passing on to the next group 
of cards. Thus, if a pack of cards sorted into groups corresponding to the 
code on a single column is passed through the tabulator with the control wired 
to that column, the machine will break control at the end of each group and 
the group totals of any desired field can thereby be obtained. 

The control can be arranged to operate on a number of columns, and 
different stages of the control can be associated with the different columns. 
Different instructions can be given to the clearing and printing mechanisms 
according to which stage of the control is operating. Thus, for example, it 
is possible to obtain totals of main and sub-groups simultaneously by feeding 
the numbers from the given field into two different counters, one of which 
is cleared at the end of each sub-group and the other at the end of each main 

Counts can be carried out on the tabulator, either in conjunction with a 
tabulation or independently, by what is known as the card count. This feeds 
1 into any desired counter at the passage of each card. The control and printing 

mechanisms operate as before. . 

The more elaborate forms of tabulator have a number of auxiliary devices 
which considerably increase their potentialities and flexibility. The two most 
important in the British machines are the rolling feature and distributors. In 
the rolling total tabulator, numbers can be transferred or rolled from one counter 
to another, either positively or negatively, according to instructions issued by 
the control mechanism. Distributors enable numbers read from a field to be 
directed to different counters, and also enable numbers taken from one counter 
to be directed to different counters in rolling, or to different print banks. 
When used in the first manner the distributors operate on instructions read 
from some other column of the card. 

A single distributor, for example, enables positive and negative numbers 
in the same field to be distributed into two counters according to their sign 
(punched in code in another column). By rolling the total of the negative 
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itself and several other numbers B , C, D, etc., can be carried out simultaneously 

without additional sorting or tabulation. , , . oanable 

The American tabulators do not have the rolling feature, but are capable 
of ™inro“”,ec, addition or subtraction according to card designator 
cSS » the rolling total tabulator, the counting wheel, may be grouped 

L “toi :„tly at will, which enables the — XS.ors whfc 
more efficiently. These tabulators, however, have no distributors, wn 
Ttot. somewhat from their usefulness in the analysis of survey data. 

5.14 The reproducing summary punch 
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A further use in survey work is the calculation of percentages, index 
numbers, etc. A simple example will illustrate the procedure. Suppose it is 
required to express the numbers B as a percentage of the numbers A, both 
sets of numbers being punched on the cards. By suitable sorts we can assemble 
the cards in batches such that the percentage value for all the cards of each 
batch is the same. Cards for which A has the value 45, for example, arid a 
value of B between 0 and 2 will have a percentage value of 0 (to the nearest 
]0 per cent.), those with B between 3 and 6 will have a percentage value of 
10, etc. Master cards are therefore made out carrying the values 


A 

B 

Percentage 

45 

0 

0 

45 

3 

10 

45 

7 

20 


etc., with similar sets for other values of A, and these are added to the pack, 
which is then sorted into numerical order of the A’s, and* of the B s within 
A’s. All the A’s having a value of 45, for example, will now be together, those 
with B between 0 and 2 being preceded by the first of the above master cards, 
those with B between 3 and 6 by the second, etc. If the whole pack is then 
passed through the reproducing punch the correct values of the percentages 
will be gang-punched into the remaining cards from the master cards. 

The disadvantage of this procedure is that a large number of master cards 
are required to cover with any high degree of accuracy fields which are at all 
extensive. Time and expense is therefore involved in the preparation of the 
cards, and they also add to the total volume of sorting required. The method 
is therefore most suitable when ratios and indices of low accuracy are required 
for large batches of data. In the analysis of the National Farm Survey it was 
used for the calculation of the percentage of the acreage of individual farms 
which was arable (12 classes), rent per acre (12 classes), etc., and for the 
combination of several items of qualitative information into a single index. 
A description of the procedure, and a method of preparing master cards by 
use of the gang punch, is given by Kempthorne (1946, B). 

5.15 The multiplying punch 

The multiplying punch is designed to read two numbers from a card, 
calculate the product and punch the result, with suitable rounding-off, in any 
field of the same card. It can also be set to read the multiplier from inter¬ 
spersed master cards, so that the numbers on the whole of a group of cards 
are multiplied by the same factor. Cross-footing multiplying punches will 
also add or subtract two or three numbers read from the same card and punch 
the result, or add one or two numbers to the product of two other numbers. 

The chief use of the multiplying punch in survey work is in the calculation 
of products of various kinds prior to summation. It can be used for the 
calculation of ratios if the reciprocals of the divisors are punched on master 
cards and the cards are then sorted according to their divisors. It is, however. 
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relatively slow in operation. Consequently when a large amount of material 
has to he handled and high accuracy is not required in die products or ratios, 
the use. of gang-punching with interspersed master cards is generally to be 
preferred. 

5.16 The collator 

The collator will take two packs previously sorted in numerical order on 
up to 16 columns and will combine them into a single pack, so arranged that 
all cards of pack B which carry a given code-number follow immediately on 
the cards of pack A carrying the same code-number. It will also select matching 
cards from pack A and pack B, rejecting cards of pack A with code-numbers 
which do not occur in pack B and vice versa. Furthermore it can be used to 
select cards from one pack which correspond in code designation to the cards 
contained in a second pack. This last property is sometimes of value in survey 
work, since it can be used to pick out trailers associated with a given set of main 
card, 8 - « it is so used, however, the precaution should be taken of matching 
up, the rest of the trailers with the remaining main cards so as to provide a 
check that no trailers have been erroneously excluded. 

5.17 Systems of coding for punched cards 

Wheri punched cards are used all non-numerical information will require 
to be coded in some quasi-numerical form. Each column of the card can 
represent a classification of up to twelve classes, which are denoted by X, Y, 

Numerical quantities do not in general require to be coded, but may 
require roundmg-off in order to economize card space and reduce the counter 
capacity needed m the subsequent tabulations. Roundihg-off, however is a 
tiresome operation and reduction of a number by r single digit should’only 
be undertaken if really necessary. Often rounding-off can be avoided by 
issuing suitable instructions to the field investigators regarding the number of 
figures required in the results. 

In certain cases it may pay to code a numerical quantity by grouping. This 
1S , particularly advantageous when the quantity is primarily required as a basis 
of classification and does not need to be summed. If . summation is needed 
large values must not be too coarsely grouped, and there must be no “ open ” 
group : it is not sufficient for all the high values to be included in an “ over — ” 
!?r ss ’ 3_ hls often necessitates an additional column or over-punching in the 
X and Y positions. If there is to be summation the grouping must also be 
chosen so as to avoid the bias which can arise through the frequent occurrence 
of particular values (see Example 7.2.b), though such bias is relatively 
unimportant when only comparative results are required. 

The construction of a code for complicated items of information, e.g. 
questions with a large number of possible alternative answers, is often a difficult 
task, since the conflicting aims of recording the information adequately, 
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between columns which are punched should be avoided by arranging all the 
blank columns at one end of the card. 

It is generally advisable to leave a few blank columns in order to accommodate 
gang-punching of such items as index numbers and grouped classifications 
required m the course of the analysis. The number of blank columns likely 
to be required depends very greatly on the type of work. They may not be 
necessary at all m simple censuses for which the exact form of analysis can 

e prescribed at the outset. If in the course of the analysis it is found that 
additional columns are necessary, these can be made available by reproducing 
t e cards, with the omission of items of information which are no longer 
required (or are not required in association with the additional columns) 8 
If more than one card is needed to accommodate all the relevant information 
on a mut, it is necessary that items of information that require to be associated 
in the analysis should ultimately appear on the same card. Apart from the 
code number each item of information need only be punched on one card 
the requisite items being transferred to other cards of the set by the reproducing 

STi In K C f rtam C , ases en ^ reI y new cards may require to be constituted in 
this way, but in others sufficient blank columns may be left on the cards 
originally punched to accommodate the additional items. Convenience of 
punching should still be one of the prime considerations in the arrangement 
of the original cards. It is generally better to make an additional set of cards 
sequence Chmg ^ t0 Separate informa tion which falls in a natural punching 

If the units which will form the basis of the analysis are not the sampling 
° r lf tbere 18 a hierarchy of units, the card arrangement presents more 
ifficult problems. This situation is of fairly frequent occurrence. As an 
example we may consider suitable card arrangements for surveys of human 
populations in which the household is the sampling unit, and in which both 

the US a e na!yti s and mdmdua S require t0 be treated 38 units in different parts of 

If the survey is a simple one, in which the whole of the information relating 
to the household can be accommodated in say 20 columns, and the whole of 
the information relating to each individual in say 10 columns, it will be possible 
to accommodate all the information relating to a household of up to five 
individuals on one card, leaving 10 columns blank for subsequent use. With 
this arrangement of the card, households of 6 to 10 individuals will require 
one trailer, 11 to 15 two trailers, etc. ^ 

This arrangement has the disadvantage that tabulations relating to individuals 
require five separate passages of the cards through the tabulator, with sorts 
etween each passage, since individuals may appear in every one of five divisions 
nf , hl l ° eS ", 0t necessitate the passage of a larger total number 

s hrough the tabulator than if each individual were represented on a 
separate card, but it does result in the summaries being produced in five 
separate parts. These will then have to be combined by hand, or by punching 
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and tabulating summary cards. The total time of tabulation will also be 
somewhat increased owing to the increased number of printing and clearing 

CyCl The alternative is to have a separate card for each individual. This udU 
in any case be necessary if the amount of information relating to individuals 
is such that more than 10-15 columns are required per individual. It is usually 
necessary to have at least some of the information relating to the household 
reproduced on each individual card. If the household information requires 
say 40 columns and the individual information 30 columns the household and 
the first individual in it can be punched on the first card. For the subsequent 
individuals only the code number of the household need be punched, the 
remainder of the household information being gang-punched subsequently. 
Fo“nvenSnce of punching, the code number should be sc.placed on the 
card that it is contiguous to the individual information Each individual 
should also be allotted a serial number within the household in some orderly 
sequence e.g. head of the household, wife, children by age and other members 
by Se If still more space is required for the household information a separate 
card'or carls will have" to be given over to the household, with a selection of 
this information gang-punched on the cards for individuals. t 

h The use of asepf rate card for each individual has one serious drawbacL 
Although the whole of the information relating to a household and to all the 
individuals in it is recorded on the cards, it is impossible to classify households 
according to the collective characteristics of the individuals contained in them 
We can g of course pick out households containing one or more individuals 
having a given characteristic, e.g. we can select all households containing babies 
of under a year old by selecting the cards representing such babies. But w 
cannot for example, classify the households according to number and age of 
Sren unless tbi information has already been coded and recorded in 
summary form on the household part of the card. . . , 

In order to enable households having given collective characteristics to be 
picked out on the sorter, it is necessary for the whole of the relevant information 
concerning all the individuals in the household to be recorded on a single 
card If therefore, individuals in the same household are spread over more 
than one 'card we must construct a new set of household cards containing t e 
Levant particulars of all individuals in the household Some upper limit 
must be imposed on size of household, households of above this size being 
dealt with by hand where necessary. Thus with a 5-column field for household 
code number, and the reservation of 15 columns for subsequent recording of 
new classifications, it is possible to allot 5 columns to each .<f ^ 

The construction of such cards can be effected by reproducing on to the^ 
set of 5 columns of the new cards the relevant columns of all individual 
numbered 1, together with the household code numbers, fo lowed^ by all 
individuals numbered 2 and so on. When the new household cards have 
been constructed they can be classified and machine coded according 
to their characteristics by sorting and gang-punching, using master cards. 
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and check computations may be carried out by the same or by different 
computers. 

In a computation which is checked by working over the figures a second 
time the main causes of error may be classified as follows 

(1) Failure to check a value which is in error. (This includes partial failure, 
e*g. recalculation of the numerical value without checking the position of the 
decimal point.) 

(2) An identical error in both the original and check computation. 

(3) Different errors in the original and check computation which produce 
the same error in the next written or examined figure. 

(4) Failure to notice disagreement between the original and check 
computations when the original is in error. 

(5) Alteration of the original to agree with the check when the check is 
in error. 

(6) Failure to carry forward correctly corrections necessitated by a 
detected error. 

(7) Incorrect procedure in the original which is followed by the checker. 

The danger of identical errors is obviously considerably greater when the 
same computer carries out both computations. Indeed at first sight it might 
appear that if a reasonably high standard of computing is attained, the chance 
of two computers making an identical error would be somewhat remote. There 
are, however, certain errors which are particularly common, such as, for 
example, the mis-reading of a badly written figure, incorrect location of the 
decimal point, the reversal of a pair of figures, e.g. 49,876 for 48,976, and 
duplication of the wrong figure, e.g . 74,496 for 74,996. 

Duplicate computations, if properly carried out, i.e. not compared too 
frequently and corrected independently if an error is detected, will very greatly 
reduce the chances of most of the above types of error. Indeed a properly 
conducted set of duplicate computations done by different computers of 
reasonably high standard may be regarded as sufficiently accurate for almost 
all census and survey calculations. The checking of a single set of computations, 
however, even by different computers, cannot be expected to eliminate all 
errors, and if no other checks exist must be looked on as an unsatisfactory 
procedure for the more important computations. 

Fortunately in many types of computation we do not have to rely solely 
on checks by repetition. A great deal of census and survey analysis is subject 
to cross checks of various kinds. Thus the counts relating to each of a number 
of classifications should all add up to the same total count. The same is true 
of totals of quantitative measurements. Indeed, where cross checks of this 
kind are not available it is often best to check a set of totals by calculating 
the grand total rather than by checking every individual total. The use of a 
grand total requires only a single comparison, which can consequently be made 
with some care. If all the individual totals have to be checked, there is serious 
danger that some discrepancy may be missed. 
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The use of cross checks in place of detail checks has a further advantage 
which is not so immediately apparent. If such a check fails to agree a good 
deal of re-computation is usually required to locate the error. This 
automatically tends to raise the standard of computation. 

It is important to recognize which types of error will be detected by cross 
checks and which will not be so detected. If a total check is relied on, for 
example, all entries in a table are checked, but their locations in the table are 
not checked. Thus it is possible for quantities to be entered under the wrong 
headings. Such errors can be minimized by observing a standard order in all 
tables and always entering the values in the standard order. 

One of the main functions of any checking system is to preserve a high 
standard in the computations. Very rigorous standards of work should be 
imposed. If more than a very few errors are found to exist a complete 
re-computation should be made. No erasures or fair copies must be permitted, 
and thorough inspection of all alterations must be made to see that errors 
are properly rectified. In large-scale routine work a record should be kept of 
the errors made by the different members of the staff. The supervisor must 
be ready at all times to resolve difficulties of procedure, otherwise the computers 
will undoubtedly attempt to resolve such difficulties amongst themselves, 
possibly incorrectly. A high standard of neatness must be insisted on. All 
figures must be legible and unambiguous not only to the writer but to others. 
This is particularly important in coding. Confusion between 6 and 0, and 
between X and Y, gives rise to many errors. 

The coding and punching of the data of a large-scale census or survey 
presents its own organizational and checking problems. Even if a good deal 
of the information has to be coded it is advisable to record the coding on the 
questionnaire forms if possible, since transcription of pre-coded and numerical 
information is thereby avoided. If this is not possible a coding sheet may 
be used. This consists of an auxiliary printed form on which the information 
is entered in code, the form being so arranged that it is both convenient for 
the punch operators and for use in minor hand-analyses, if these are found to 
be required. 

In certain cases the field investigators can be asked to code their own 
material at the end of each day’s work. In general, however, this is not 
likely to be satisfactory, as it is difficult to preserve consistent standards of 
coding. 

If the coding is at all difficult it is best to code one or a small group of 
items of information on a batch of forms at one time, rather than to code the 
' whole of each form in turn. Whatever the detailed procedure adopted, however, 
it is essential for the supervisors to carry out adequate checks to ensure that 
correct and consistent standards are maintained. 

In addition to routine checks it is often possible to impose checks of various 
kinds for gross errors and inconsistencies of coding. A special type of sorter, 
which picks out cards carrying a given code in a number of columns, can be 
used for this purpose. 
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Table 5.21—Analysis of the National Farm Survey : 
CONSTITUTION OF SAMPLE 


Size-group 

(acres) 

Average size 
(acres) 

- * No. of 
holdings 

Sampling 
fraction 
(per cent.) 

No. of 
holdings 
in sample 

5- 25 

12 

101,450 

5 

5,072 

25-100 

55 

111,360 

10 

11,136' 

100-300 

165 

65,210 

25 

16,302 

300-700 

413 

11,150 

50 

5,575 

Over 700 

1,035 

1,430 

100 | 

1,430 



290,600 

(13-6) 

39,515 | 


Had a uniform sampling fraction been used in place of a variable sampling 
fraction, a sample over twice as large would have been required to give results 
of the same accuracy on such items as the percentage of land under different 
systems of tenure. By the use of a variable sampling fraction results of ample 
accuracy were obtained from an analysis covering only one-seventh of ail the 
holdings. This not only considerably reduced the amount of coding and 
machine work, but also enabled work to proceed as soon as the information 
tor the sample farms had been assembled and abstracted. In consequence 
it was possible to make the results of the analysis available a year or two sooner 

than would have been the case had the whole of the material had to be abstracted 
before analysis. 

5.22 Adjustment of the results to compensate for defects in the sample 

When the sampling procedure is defective in one respect or another 
attempts are sometimes made to adjust the results in order to compensate for 
the defects. Thus it may happen that owing to defects in the selection of the 
sample or m the collection of the information, different classes of the population 
are found to be represented in incorrect proportions in the final sample. In 
such cases it is possible to adjust the results by weighting the different classes 
in such a manner as to compensate for the errors in the proportions. 

This procedure must be clearly distinguished from the procedure of 
stratification after selection mentioned in Section 3.3. The validity of the 
latter procedure depends on the fact that the sample as a whole is random 
and therefore the selection from within strata is also random. If the 
proportions in the different classes are different because of defects in the 
sampling procedure, however, it is most unlikely that the selection from within 
these classifications will be fully random. Any adjustment of the type envisaged 
therefore, although it may somewhat improve matters, must not be expected 
to eliminate by any means the whole of the defects. 
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Stratification after selection is a special case of the use of supplementary 
information of all kinds. Such adjustments, whether planned at the outset 
or decided on subsequently after examination of the data, are quite justified. 
The essential difference between these adjustments and between adjustments, 
of the same type made in order to compensate for defects in the sampling 
procedure is that in the former the selection is random, except for permissible 
restrictions, whereas in the latter it may be biased in various ways. 

In general, if the sampling procedure is defective it is best to report the 
results obtained without adjustment. At the same time data should be given 
indicating, so far as is possible, the deviations of the sample from the expected 
distributions. Thus if the proportions in the different classes of a classification 
are known for the population these may be presented alongside the parallel 
classification of the sample. Similarly the sample means of quantities for 
which the population means are known may be presented for comparison.. 
Occasionally an adjustment of some of the more important values derived from 
the sample may be considered worth while, but in such cases the unadjusted 
results should also be presented. 

The above remarks apply primarily to samples for which the sampling 
procedure is markedly defective. In cases in which there are slight defects 
such as a minor degree of non-response, the application of some small 
adjustment, if this appears necessary, is more justified. If such adjustments, 
are made, however, the fact should be clearly stated and their magnitude 
should be indicated. 

The simplest way of dealing with non-response is to regard the non¬ 
respondents as similar to the remainder of the sample, i.e. to treat the sample 
as if it were a sample on a smaller number of units. With a stratified sample, 
the non-respondents in each stratum can be treated as the equivalent of 
respondents in that stratum. Alternatively some other appropriate classification 
can be used, as indicated above. 

If follow-up methods have been used and there has been a good response 
to the follow-up, initial non-respondents who subsequently respond can be 
treated as a sub-sample of all initial non-respondents and weighted accordingly. 
It is clear that if there is any difference between respondents and non-respondents, 
the final non-respondents may be expected to be more like the initial non¬ 
respondents than the general population. This procedure was first, so far as. 
I know, suggested by Professor D. V. Glass, and was used by him in the 
analysis of the Family Census (Section 4.10). . 

In this survey those who failed to provide the enumerator with the required 
information were sent a letter further explaining the purposes of the survey 
and requesting that the form be sent direct to the Royal Commission. Of the 
230,000 initial non-respondents (i.e. 17 per cent, of the whole sample), 50,00b 
responded to this appeal. This 50,000 therefore constituted a sample (though 
a non-random one) of the 230,000, and the first 12,000 of the 50,000 replies, 
were combined with the remainder of the sample with a weight of 230/12. 
This procedure was found to give overall birth-rates which corresponded 
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very closely to those already known from other sources, whereas the "original' 

thTm!w V t e fi' rateS .T hlCh WerC substantiall y t0 ° high, owing to the fact that 
the majority of the initial non-respondents were women with few or no children 


5.23 Critical analysis of survey data 


At the outset a clear distinction must be made between the types of deduction 
that can be made with certainty from survey data and the types that are 
speculative. If in a nutrition survey, for example, we find that children of 
arge families are worse fed than children of small families we can draw the 
defimte conclusion that size of family is associated with malnutrition of the 
children, and we can give quantitative estimates of the degree of malnutrition 
actually existing amongst children of families of different sizes. We cannot 
however assert with certainty that size of family is the cause of this malnutrition,* 
1 °t!, the faC V ha j in large families the income per head is automatically 
relationship * & ^ mC ° me W ° uld lead Us t0 ex P ect an underlying causal 

Even in situations where a definite causal relationship is known to exist 
deductions as to the magnitudes of the effects of given factors can never be 
made with certainty from survey data. We may, for instance, find that fields 
receiving fertilizers give higher yields per acre than fields without fertilizers. 
Yet we cannot attribute the observed differences solely to differences in fertilizers. 
The farmers using the fertilizers may be farming better land, they may be 
growing higher-yielding varieties, and they may be carrying out their farming 
operations with greater skill. 6 

, Cb : arly ^finable extraneous factors which may influence the estimates of 
the effects of other factors can be determined in the course of the survey. 
Under certain circumstances the disturbance due to them can be eliminated 
by ™ e . thods analysis which will be outlined in this and the following section. 
But there will always be other undetermined and' possibly unascertainable 
factors which cannot be taken into account. 

In order to determine with certainty the magnitude in the causal sense of 
the effect of any given factor, experiments must be undertaken. Surveys 
cannot be regarded as satisfactory substitutes for experiments. Nevertheless 
they are of value in situations in which experiments are difficult or impossible 
though in such cases all conclusions must be tentative. They are also of value 
as a preliminary to experimental work,, since they frequently indicate the 
iactors that are likely to be most worth investigation. 

If, however, survey data are to be effectively used for either of these two 
purposes it is important to have means of eliminating the effects of extraneous 
factors m so far as this is possible. 

A simple example will illustrate the problem involved. Table 5.23.a 
gives the numbers of fields, totals and means of yields per acre of a sample of 
901 potato fields classified according to (a) the five regions into which the 
country was divided, and (b) the five varieties included in the survey. These 
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results were obtained in the course of an investigation into the blackening of 
potatoes on cooking, the data on yields being collected from the farmers when 
the samples were taken, together with a considerable amount of information 
orf fertilizers, cultural practices, etc. Approximately 180 fields were selected 
in each region. The selection within regions was not strictly random, but can 
be regarded as substantially so for the purpose of the present discussion. I he 
sample was confined to the five named varieties, but was not stratified by 

Vat From the results of Table 5.23. a it is apparent that the mean yield is highest 
for the Scottish region, and is also higher for the Northern region than for the 
remaining regions. There are, however, evfen larger varietal differences in 
yield. Consequently if the varieties were grown in different proportions in 
the different regions the regional differences are likely to be influenced by 
varietal differences. 

To examine this point it is necessary to construct the two-way classification, 
regions X varieties. This is shown in Table 5.23.b. The values of 
Table 5.23.a appear as marginal totals in this table. 


Table 5.23. a—P otato survey : numbers of fields and totals and means 

OF THE YIELDS PER ACRE (TONS) 


(a) Classified by regions 



No. 

Total 

Mean 

Scotland 

174 

1,482 

8-52 

North. 

177 

1,425 

8-05 

E. Midlands 

189 

1,415 

7-49 

South 

182 

1,324 

7-27 

West . 

179 

1,368 

7-64 

All . 

901 

7,014 

7-78 


{b) Classified by varieties 



No. 

Total 

Mean 

Majestic 

393 

3,292 

8-38 

King Edward 

250 

1,563 

6-25 

Great Scot 

56 

461 

8-23 

Arran Banner 

84 

766 

9-12 

Kerr’s Pink 

118 

932 

7-90 

All 

901 

7,014 

7-78 
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TABLE 5.23.b-POTATO survey: two-way classification of the data by 

regions and varieties 

Numbers of fields 


Majestic . 
King Edward 
Great Scot 
Arran Banner 
Kerr’s Pink 
Total 



Totals of yields per acre {tons) 


Majestic . 
King Edward 
Great Scot 
Arran Banner 
Kerr’s Pink 
Total 



Means of yields per acre {tons) 


Majestic . 
King Edward 
Great Scot 
Arran Banner 
Kerr’s Pink 
All . 


Scot. 

North 

E. Mid. 

South 

West ^ 

All 

9-46 

8-19 

842 

8*15 

8*27 

8*38 

7-65 

5-71 

6-34 

5*87 

5*49 

6-25 

9-22 

7-57 

— 

8*17 

7*78 

8*23 

9-12 

9*24 

— 

7*22 

9*54 

j 9*12 

8-29 

7*61 

— 

— 

6*61 

7*90 

8-52 

8-05 

7-49 

7-27 

7-64 

7*78 
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estimates the differences between regions that would occur in the hypothetical 
situation in which equal numbers' of fields of all varieties were grown in each 
region. Method 2 gives estimates appropriate to the hypothetical situation 
in which the different varieties are grown in the same proportions in all regions, 
these proportions being equal to the average proportions for the whole country. 
Only if the difference^ between the different varieties are the same for all 
regions will the estimated differences be the same. 

Method 2 has two advantages over Method 1. It is in general more 
accurate, since greater weight is on the average given to the cells containing 
the greater numbers of units. It also gives estimates which refer to a 
hypothetical situation more in conformity with that actually existing. 

Table 5.23. c— Potato survey : unweighted means of sub-class means 
(a) omitting E. Midlands and Southern regions, (b) omitting Great 
Scot, Arran Banner and Kerr’s Pink 




Unadjusted 

Means adjusted 




means 

to sample 

mean 



(a) 

(b) 

(a) 

(6) 

Scotland . 


8-75 

8-56 

8*55 

8*98 

North 


7-66 

6-95 

7*46 

7*37 

E. Midlands 


— 

7*38 

— 

7*80 

South 


— 

7*01 

— 

7*43 

West 


, 7-54 

6*88 

7-34 

7*30 

Mean 

7*98 

7*36 

7*78 

7*78 

Majestic . 


8-64 

8*50 

8*44 

8*92 

King Edward . 


6-28 

6*21 

6*08 

6*63 

Great Scot 


8-19 

— 

7*99 

_ _ 

Arran Banner . 


9*30 

— 

9-10 

_ _. 

Kerr’s Pink 


7-50 

— 

7-30 

— 

Mean 

7-98 

7*36 

7*78 

7*78 


It will be recognized that the marginal means of the sub-class means, 
whether weighted or unweighted, do not contain the whole of the information. 
If variety P yields more than variety Q in one region and less in another, for 
example, this fact can only be established from the sub-class means. Under 
such circumstances any comparison of the regions based on equalization of 
the proportions of the varieties represents an over-simplification of the real 
situation. 

Neither Method 1 nor Method 2 can be applied to the whole of tables 
in which there are blank cells. Even if there are no blank cells neither method 
will be very satisfactory when there are certain cells which contain so few 
units that the corresponding cell means are very inaccurate. Thus in the 
present example the relatively large difference between Scotland and the 
Northern region in column (b) of Table 5.23.C, in contrast to column (a), 
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is in part due to the fact that Arran Banner has given a larger yield in the 
Northern region than in Scotland. This may well be due to sampling errors, 
•since there are only 8 fields of Arran Banner in Scotland. 

There are two further relatively simple methods which are of value in such 
circumstances. 

(3) Weighted means of differences of sub-class means 

If only two classes (cross-classified by a second classification into a number 
of sub-classes) are to be compared, a weighted mean of the differences of each 
pair of sub-classes can be taken. Maximum accuracy will be attained when 
the weights are inversely proportional to the squares of the standard errors 
of these differences. 

With independent samples in each sub-class the square of the standard 
error of a difference is equal to the sum of the squares of the standard errors 
of the two means (Section 7.5). Under certain circumstances, which will 
be apparent from a study of Chapter 7, and in particular when the selection 
from within sub-classes is effectively random and the standard deviation per 
unit is constant, the standard errors are inversely proportional to the square 
roots of the numbers of units in the sub-classes (Section 7.1). In this case 
the reciprocals of the weights must be taken proportional to the sums of the 
reciprocals of the pairs of sub-class numbers, i.e. if n x and n 2 are a pair of 
sub-class numbers the weight can be taken equal to w> where 

1 _ ! JL 
w n^ n 2 

The calculations are shown in Table 5.23.d. The weight for Majestic, for 
example, is given by 1/w = 1/37 + 1/75. Weights may be taken to the nearest 

Table 5.23.d— Potato survey: estimate of difference of Scottish and 
Northern regions from weighted mean of varietal differences 



Difference 

z 

Weight 

w 

wz 

Majestic 

+ 1*27 

25 

+ 31-75 

King Edward 

+ 1*94 

10 

+ 19-40 

Great Scot 

+ 1-65 

8 

+ 13-20 

Arran Banner 

- 0-12 

7 

- 0-84 

Kerr’s Pink . 

+ 0-68 

24 

+ 16-32 

Total 

Weighted Mean 

+ 1-08 

74 

+ 79-83 


whole number. They can be rapidly calculated from a table of reciprocals 
or on a slide rule. The weighted mean, + 1-08, is obtained by dividing the 
total of w% by the total of w. 
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(4) Pooling of classes 

^.~ ases * n ^bich inspection or use of the previous methods indicates that 

pooled 1 ThSofteTl" - er ? in K? f , thC C ? SSeS are smaII > such classes be 
fable. h f eliminates blanks and very small numbers in the sub-class 

betwee^thrS^ FhTl inS P ection “! dicates that there is little difference 

Tab™ 23 c The, 8 regK>nS - J** * C ° nfirmed b 7 the " in 
iable 5.23.c. These regions may therefore be pooled. This pooling will 

fhTgifenT SHI 46 di ff ences between the last three varieties 8 than 

i weight, following Method^. ^ 8 SCOtl “ d b ' indud ' d “ 

The calculations are shown in Table 5.23.e. It will be noted that the 

fo°r°thf f 311 F e ? e f ed by adding the numbers of fields and totals of yields 
for the four English regions from Table 5.23.b. Y 


Table 5. 23.e—P otato survey: effect of pooling English regions 



English regions (pooled) 

Scotland : 

Mean 

Weighted 

mean 

No. 

Total 

Mean 

Majestic . 

356 

2,942 

8-26 

9-46 

8-50 

King Edward . 

208 

1,242 

5-97 

7-65 

6-3! 

Great Scot 

38 

295 

7-76 

9*22 

8*05 

j Arran Banner . 

76 

693 

9-12 

9T2 

9-12 

Kerr's Pink 

49 

360 

7-35 

8-29 

7-54 

All . 

727 

5,532 

7-61 

8-52 

7-79 

Weight . 

4 J 

1 



5.24 Method of fitting constants 


Method?'c a a n e h m ° re ^ an UV0 daSSeS between which Terences are required, 
Method 3 can be used to compare each pair separately. It will not however 

be^erT S Tn Se ?°l eStimateS ’ U the su “ of ^e «hJt^d5S2S 

A wdC Tht an fl betWeen A and C wiU n0t exactJ y equal that between 
nd C ' Thls 1S a reflection of the fact that Method 3, though fully efficient 

sensef wh^i^h^e?re C asses table, is not fully efficient fin the^SS 
ser^e) when there are more than two classes. In this case, in addition to direct 

etc P r?n S I, nS b ? Wee ?,^ md B > indirect comparisons of A with C and C with B 
e tc., can be made. When the sub-class frequencies are not proportionate these 
COn “ b “ le information .To Bfa m 
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P occurs in regions A and C only, and variety Q in regions B and C only, 
comparisons bftween regions A and B with differences beUveen vaneUes 
eliminated can only be made by indirect comparisons mvdvm 
Estimates of maximum accuracy, which, as might be. 
consistent, can be obtained by fitting constants by the method of least squares 
(Y a te S 1934 A ; Snedecor, 1934, A ; Snedecor and Cox, 1935, A), bnedecor 
£ drawn attention to the fact that if the numbers of units in Ae 
different sub-classes are nearly proportionate, Method 2 can be used with 
appreciable loss of information in place of the more laborious method of 

fitti Stevens St 0948, A) has given a simple arithmetical method of obtaining the 
vahmttftiie estimates defied by fitting constants, ^e pjocri™* w^ is 
one of successive approximation, is illustrated for the data of our example 

in Table 5.24. 

Table 5 . 24 -Potato survey: estimation of varietal and regional 

DIFFERENCES BY FITTING CONSTANTS 

I | ^ xt xr Ayr q w Starting values Final 

Sc. N. E.M. S. W. aQd co £ ections values 

901 174 177 189 182 179 (1) (2) (3) W 

, st s s s *a *s s ±.s 3 3 ±.3 «| 

14 a • s = d 5 ±s _s ±‘s is 


( 1 ) 4-.74 +-27 —-29 —-51 —'14 

Starting (2) +-09 --48 +-36 +-14 --16 -<~ 

V aM S (3) -37 -3^ — 

corrections 4 +-13 +-01 —06 —06 —03 

(0) +-03 + -01 —-02 —02 -00 

Final (6)~ +’99 —19 —01 --45 --33 

values (7) 8-77 7-59 7-77 7-33 


The number of fields in each sub-class, reproduced from Table 5.23 b, 
is shown in the body of the table, with marginal totals above and to the lett. 
Column 1 and row 1 give the deviations of the varietal and ^gioMl mean 
Yields from the general mean 7-78. These are obtained from Table 5.23 a 
S s 23.b. Thus, for Majestic, 8-38 - 7-78 = + 0-60. These data are all 

that are necessary for the estimation process. . * 

The approximation should be started with the set of deviationswluch 
show the biggest differences, in this case the varietal deviations. We first 




problems of execution and analysis sect. 5.24 

ffSfsSfTil' 1 ^ T the “ v " i « al d «™tions 
with signs reversed in row ^T„ 2' • T r ® gl0nal deviations are shown 
column are ZlnJZ turn hv ^ **. nUmberS of fieIds » each 

divided by the total number of^fielrf . evlatl ° ns In coIumn 1, summed and 
the deviation is dS m each coIumn - Thus for Scotland 

(+ 60 x 37 — 1-53 X 42 + .45 x 18 + 1-34 x 8 -f -12 x 69)/174 

.... = — 14-96/174 = — -09 

- 14 S -S lie f °bVfn e °f-? 0lumn \ lt is best to record the sums of products 
the total’of these sums oTnrod t ^ S - provlde * a check against minor errors’ 

cd» r . md taI £ Iumn s of f-2? 

differs from zero only because of rounding-off errors. ’ + 22 ’ 

deviation if there^ere^ n^regional differences 1 ^ * S 

obtained by adding tjl J^i*™** “ 8h °»" » * ">*<■ is 

by Sona™ iSrelT”°T? “ col “ n " 1 !’ however, are themeelvea affected 

cor«y: n rare 0 r„lt r c„rjr1. 0 T h ^ h et 

r mo 1 

which ar^ehowt ZctUZOT T a “»» d « »f varietal cortections, 

of regional corrections, which'are shown Z “w T " 8ed “ *™ * ,hM SM 

*>“ “nections ■* now so 
mean 7 78 t * I , T !* ese devia tions may then be added to the general 

valne.'rf tZZ Ttot dTVT ereat ‘ > ' fromtI “ 'orresponding 

tables 5.23.c, 5.23.d and 5.23.e, but they do differ substantially 
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from the values of Table 5.23.a. The differences between Scotland and the 
Northern region, for example, are as follows 


Regional means over all varieties (Table 5.23.a) •• 
Unweighted means of sub-class means (Table 5.23.c) | 


(*) 


Weighted differences (Table 5.23.d) 
Fitting constants (Table 5.24) 


0-47 

1-09 

1-61 

1-08 

1-18 


Equally the varietal means given in Table 5.23.e are very close to those of 

dtmonM.attS that the more direct methods used 
are caoable of giving satisfactory estimates. As is shown in Examp • , 
where the errors of these estimates are discussed, the third of 
differences viz. 1-61, is decidedly less accurate than the others, since th 
information provided by the last three varieties is not taken into account 
On the other hand, the agreement of the above estimates must be take 

as providing any indication of their real accuracy. Since f ^ the effeSs 
based on the same data, any disagreement is primarily a reflection of theettects 
of the various approximations on the efficiency of the estimation P • 
There are some further general points in connection with the method ot 

fitting constants which should be noted. 

The method will provide efficient estimates of the differences due to one 
classification, freed from the effects of the other classification if the true 
differences are the same for all classes of the second classification. In the 
terminology of the design of experiments, this is equivalent to thenon-existence 
of interactions. If the true differences vary markedly the method is 
inappropriate. Instead the individual differences should be considered or 
Method 1 or Method 2 should be used. (See sub-section (2) above.) 

As mentioned above, if one of the classifications consists of W» c ^es 
only, Method 3 is fully efficient for estimating the difference between these 
twJ classes. If estimates of the differences between the classes of he second 
classification are required, these may be derived by adjusting; the: class means 
in the manner followed in Table 5.24, assigning deinationsofplusandmmus 
half the difference to the two classes of the first classification, 
is in this case exact, and therefore no further approximations^ 

The method can be extended to multiple classifications having 
more sets of classes. In this case the data required are the general means for 
each main classification together with the two-way tables of nu “ b ^ O U “ ® 
corresponding to each pair of classifications. The numbers of units ‘n the 
individual cells of the three- or more-way table are not required. If th 
are only two classifications the general means for each mam classification, 
together with the numbers of units in the individual sub-classes, are requi • 
g The reporting of sub-class numbers of the relevant pairs of classifications 
should therefore be considered even in cases in which the reporting of the 
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wo P ^ 

then be studied subsequentIy ^ 

5.25 Preparation of reports 

necessary to embody rtf resuSTnT report add” C ° mpleted il is anally 

14 ~ r^irt d eform of tabies - d 

according to" the ° f SUch re P orts vary greatly 

are in general similar for sample cen™ f H pur P oses of the report but 

and surveys. We therefore do not propose "tTT “ d C ° mplete Censuse « 
There are, however, certain mxZI u° u , dlscuss them here, 
census or survey which do not arise (or ofT^'^ Kported on in a ^mple 

census These matters ,have been covered Vv!/ mportance ) in a complete 
referred to in Section 1.6) prepared bTZ rr^ . t memora ndum (already 
on Statistical Sampling, entitled Recommend ** Sub-Commission 
Of Repons on Samp U„ g ^ «< ^ M ion 

(1) General description of the survey 

following 8 point8. de Srof A^^ V 2 Q jf? 0l,1 f /. nclude “formation on the 
detailed technical sections of the report. q 116 fU ^ treatment “ the more 

(a) Statement of purposes of the survev A > • 

given of the purposes of the survey and thfT^ . lndlcation sh ould be 
expected that the results would be utilized ^ Wb * Ch U bad been 

<*> Description of the material covered A , 

given of the geographical region and descn Ption should be 

by the survey. I n a survey^ a human “ teg ? neS of f material covered 
necessary to specify whether^ uch categories^^hotel 0n ' k is 

(e.g. boarding houses, sanatoriums) vagrants ! reside nt s , institutions 
included. The reporter should guafd a"’ "4^ perSOnnel > were 
apprehension regarding the coverage of the survew” 7 P ° SSible W 
(c) Nature of the information collected tl • u 

considerable detail, including a statement f Sh ° U d be ported in 
collected but not reported on § ThefT °t ms of “formation 

' and .f-ant parts "f Se in ^ T™ ° f the «****» 

special rules for coding and classifying * f 1 ^ SUrvey (including 
impracticable, h may bf possible to^at ° f Value - « this is 

copies which may be obtained on request^ 3 Hmited nUmber of 

' * The introduction to the memorandum is omitted. 
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SECT. 5.25 sAiviJTi^v* -- 

(d) Method of collecting the data. Whether by interne, invent™*, 

(«) Sampling method. A “ “^'“etS'oMhfmpteflht proportion 
olrm'Sctemd, and atnmgem.nts for (oll.w-»p., * 

(/) » f ,h '‘”“ d sh ° ul<1 - b ° 

fe) ;Ex 

(*) Point or period [of rime]. Point or period of tone to whtch 

(j) Date and duration. The starting date and period taken for the field 

7 , 

(i) References. References should be given to any publ.shed repot, 

papers. 

<2) rZlZr^ - *• "<* b ‘ spedfiei ' 

/o\ Method of selecting sample-units f 

The reporter should describe the SlISuTon which 

and if it is no. a mndom selecrion he should md.e.m ^ 

J- sampling. 

(4) Personnel and equipment oreaniza tion of the personnel 

It is desirable tc > give ^ the primary data, together 

employed in collecting, p f, rPV imis training and experience. Arrange- 

with information regarding their p methods of processing data 

men,, i„, training inspecuon “”^”2” ,he accuracy bo.h’of 

brief mention may he made of the 

equipment used in processing the data. 

;^rSp“»?r P rSi“ri.« ..u »• ““»• * 
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PROBLEMS OF EXECUTION AND ANALYSIS 


sect. 5 . 25 * 


The critical observations of the technicians in regard to any part of the 
survey should be given. These observations will help others to improve their 
operations. 

(5) Costs 

An important reason for the use of sampling (instead of complete 
enumeration) is lower cost. Information on costs is therefore of great interest. 
Costs should be classified so far as possible under such heads as preparation 
(showing separately the cost of pilot studies), field work, supervision, processing, 
analysis, and overhead costs. In addition, labour costs in man-weeks of 
different grades of staff, and also time required for interview and journey 
time and transport costs between interviews, should be given. The compilation 
of such information, although often inconvenient, is usually worth undertaking 
as it may suggest substantial economies in the planning of future surveys. 
Efficient design demands a knowledge of the various components of cost, as. 
well as of the components of variance. 

(6) Accuracy of the survey 

(a) Precision as indicated by the random sampling errors deducible from 
the survey. Standard deviations of sample-units should be given in 
addition to such standard errors (of means, totals, etc.) as are of interest. 
The process of deducing these estimates of error should be made 
entirely clear. This process will depend intimately on the design 
of the sample survey. An analysis of the variances of the sampling-units 
into such components as appear to be of interest for the planning of 
future surveys is also of great value. 

(b) Degree of agreement observed between independent investigators 
covering the same material. Such comparison will be possible only 
when interpenetrating samples have been used, or checks have been 
imposed on part of the survey. It is only by these means that the 
survey can provide an objective test of possible personal equations, 
(differential bias among the investigators). 

(c) Other non-sampling errors, (i) Errors which are common to all 
investigators, and indeed any constant component of error (or “ bias ”) 
in the recorded information, will not be included in the estimates of 
the fandom sampling errors deducible from the survey results, 
(ii) Another source of error of the same type is that due to observation 
of quantities which do not correspond exactly to the quantities of which 
estimates are required : in a crop-cutting survey, for example, the 
yields of the sample plots give estimates of the amount of grain, etc., 
in the standing crop, whereas the final yield will be affected by losses 
at harvest, (iii) The possible effects of such errors on the accuracy 
of the results, and of incompleteness in the recorded information 
(< e.g . non-response, lack of records, whether covering the whole of the 
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survey or particular areas or categories of the material), should therefore 
be fully discussed, (iv) Any special checks instituted to control and 
determine the magnitude of these errors should be described, and the 
results reported. 

(d) Accuracy, completeness and adequacy of the frame. The accuracy of 
the frame can and should be checked and corrected automatically in 
the course of the enquiry, and such checks afford useful guidance for 
the future. Its completeness and adequacy cannot be judged by internal 
evidence alone. Thus complete omission of a geographic region or 
the complete or partial omission of any particular class of the material 
intended to be covered cannot be discovered by the enquiry itself, 
and auxiliary investigations have often to be made. These should be 
put on record, indicating the extent of inaccuracy which may be 
ascribable to such defects. 

(e) Comparison with other sources of information. Every reasonable 
effort should be made to provide outside comparisons with other 
sources of information. Such comparisons should be reported along 
with the other results, and the significant differences should be discussed. 
The object of this is not to throw light on the sampling error—since 
a well-designed survey provides adequate internal estimates of such 
errors—but rather to gain knowledge of biases, and other non-random 
errors. 

( f ) Efficiency. The results of a survey often provide information which 
enables investigations to be made on the efficiency of the sampling 
designs, in relation to other sampling designs which might have been 
used in the survey. The results of any such investigations should be 
reported. To be fully relevant the relative costs of the different 
sampling methods must be taken into account when assessing the 
relative efficiency of different designs and intensities of sampling. 
Such an investigation can be extended to consideration of the relation 
between the cost of carrying out surveys of different levels of accuracy 
and the losses resulting from errors in the estimates provided. This 
provides a basis for determining whether the survey was fully adequate 
for its purpose, or whether future surveys should be planned to give 
results of higher or lower accuracy. 
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CHAPTER 6 


estimation of the population values 

6.1 Possibility of alternative estimates 

popuMm vahS'fr^ tte^uumric'l* de " VMi< ”' of 'stimtcs of the 
4* example f'T d !” the . A 

sample values of a random samnle Tr ;« „ , ^ t e anthmetic mean of the 

an estimate of the meanTf t^nlnnl! 7 kn ° Wn that this mean Prides 
though i. wil, no, gsaStg 7“* 2* ““P 1 ' ™ d ™»». 
of the population. P ^ > be exactly equal to the mean 

of "VopulS «£ 2* Ssf n0 ' *1 °7 *-* —» 

central value, or the geometric mefn " 7 ’ 1116 median > *•* the 

logarithms of the sample values or eve !h anUl °Z ar ! thm of the mean of the 
values in the sample. ’ ” ^ mean ° f the hl ghest and lowest 

s ch s e 7” T d 7 medi ”' ' vhkh >* 

infonmtion associated with these values thSe“fmh 2 7 S “ pplen “ ntar S' 

which can be derived by taking account of * u * alternatlve estimates 

as is available, either quitive or”Z,”Lfo ThT^TT 5 ’ in£on » illi » 

in Section 3 . 3 , if the number* nf c \ Ahus, as has been mentioned 

the different strata of some stratified *”7 the wh ° e P°P uIatioi » falling in 
adjusted so that the different strata o r ^ ^ no ^ ^n, a ran( Aom sample can be 
Similarly, supplementary informed o^ 6 ^ 6 *? m . their correct Proportions, 
in various way7m pmri^JtZ7 Character can be 
than the simpler estimates which do no^ utilize'riiifTnfornI’f m ° re aCCUme 

differen^criteria^have to be Th ** ^ °^ Sa ™P lin g three 

i"Te - = 

distributed—the meaning of this ’ te • P°P u , atlon values are normally 
arithmetic mean dfprovidean 1*7 17 “ Chapter 7 -the 
apart from supplementary information \7 maxim ^ *** fr ° m biaS and ’ 
sufficiently simple comp^ona^ for ’ pmeS™ mZ*? * “ 7 ° 

mean will remain an unbiased estimate of 7 , • More im P or tant, the 

form of the distribution of the nonulSf f 7 po P ulatl , on mean whatever the 

be the most StS S'TZT^l'^ 

incidental advantage that the , The mea n also has the 

3S»«£ =?~ 
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Of ,h. compute™ involved. An, t ^“to 

efficiency requires advanced mathemati . t sour ce of bias ; where 

recommended estimates are free rom ar i S e indicated, 

this is no, the case the cucurns,h"le“ potion, bn, frequend. 
Estimates are required not only lor the w P . f stu dy. The 
also for the different parts of it which constitute tSfpopulat iol are in 
formulae of estimation which are app lc * . and need no t be discussed 
general applicable also to the separa e ’ 1 aD plied to the population 

Separately In certain cases adjustments 

estimates cannot be applied to the means 

if the population mean of a suppleme Y means of supplementary 

for the different domains of study*^en of" "ole population. In 

,o —»t 

SSTJS may - sbghdy seduced 

by the stratification. 

6.2 Notation . 

It is important, bod, in disengon of ^.pmbton £”, 

die mathematical lormifc, “i ™jL lat i,,n values themselves. The present 
the population values and t P P , t t b e population parameters 

convention in mathematical statistics is to .denote the poP £ Latin 

by Grefik letters and the corresponding^estimate si >y the « is 

letters. This convention, however, is for 

in any case more appropriate for m n yp present manual we have 
the finite populations met with in sampling In *ep^t , values 
for the most part adopted the convention of denoting the P P 

• by bold wo, the indie ^pe. 

themean ofthesevulues will be denoted ^e^true' mean^^the 

the estimate ^^sample W e shall have y = y, but y differs 

population by y. TWals for the population are indicated by 

SSi’irZti rsam“s % Tand — ove, the 

“r»r^io, estimation »e s toll to »« o^ £ 

supplementary infotmation, such as srte of u , h , ti ot m 

the selected sampling unto bu, dso or heW, ^ aed „ 

t fiTpt'nTian^ttog quanttotive supplementary infotmation 

«r*11 denoted bv X, 
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it may be necessar^to"Sal*^?timate°of^the 3 * 316 ^ ^ wh f e P°P ulation > 
standardized value of this var ;!^ ™ , f h po P ulatlon val ues for some 
a variate of Ais type ^ ^ ^ * wiU aIso be to denote 

The following is , ,« of Ae principa| symMs ^ ^ ^ 

0 , estimated regression coefficient. : 

working sampling fraction, 
exact sampling fraction, 
working raising factor (= 1//). 
exact raising factor (= 1/f). 

values . banging to a particular stratum i. 
number of units m the sample. 

"™ b l r . ° f ™ iK 1“ tllc population, and its estimate. 

’tkv'js p—* * *«■ 

ratio yjx. 

ratio Y/X, and its estimate, 
summation over the units of the sample, 
summation within stratum i. 
summation over the strata. 

number of units in the sample possessing a given attribute 
"td'it “^e thepopu “°"»—4 ■ *- «££ 

auppfcmmtery quantitative variate, such as size of unit 
total of x for the population, and its estimate, 
quantitative variate under investigation, 
total of y for the population, and its estimate, 
means of r, x, y, for the sample. 

means of * and y for the population, and their estimates. 

6.3 General rules 

of “2L" ^fl n,enB, »>"■ »PPlf to .11 types 

Rule 1 ~ The Population total of a quantitative variate 

Ruk 2—Number of units in the population 


/, 

f, 

§, 

g, 

i (suffix), 
«. 

N, N, 
P. P, 

r, 

*, r, 
S, 
Si, 
S, 

u, 

U, U, 

x, 

X, X, 

y, 

Y Y 

r, 5c, } y y 

% X, y, 
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Rule 3—The population mean of a quantitative variate 

Divide the estimated total of the variate for the population by the 
estimated number of units in the population. 

Rule 4-Proportion {or percentage) of units possessing a given qualitative character 

Proceed as for a quantitative variate, scoring all units possessing the 
g"en chapter as And all others as 0. Divide the estimated total score 
by the estimated number of units in the population. 

Rule 5—Ratio of two quantitative variates 

Estimate the totals of the two quantitative variates for the population 
by Rule 1 and take the ratio of these totals. (Rules 3 an P 

cases of Rule 5.) 

In cases in which the probability of selection of all units is the same 
(uniform sampling fraction or, in the case of multi-stage samp ^ 

overall sampling fraction), the first four rules can be A est matedby 

general rule that means and proportions in the population ar ® est ^ 
the corresponding means and proportions in the sample, and t 

in the population estimated by tnnltiplying the correspond,„g 

totals and numbers by the common raising factor. , in this 

The above rules cover most of the methods of estimation discussed in tp 

manual except those involving regression which cannot east^ s “ 1lsed 

in simple rules. They give rise to the formulae of estimation set out in 

following sections of this chapter. 

6.4 Random sample 

Number: 

N =gn 

N will be equal to N except for minor discrepancies due to the use of a 
working sampling fraction which does not give an integral uumber of sa p g 
units. If N is known then the true sampling fraction f equals «/N and the 

true raising factor g equals N/w. 

Mean of a quantitative variate : . 

y-y^y) (6 ' 4 ' a) 

Total of a quantitative variate : 

Y=^S(y) = Ny 

or more accurately, if N is known, and differs from N, 

Y' = gS(y)=*Ny 

Proportion possessing a given attribute : 

u 

p=« 


(6.4.b) 
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Number possessing a given attribute : 

U = gu = Np 

or, more accurately, 

U' — gw = Np (6.4. c) 

The same formulae of estimation will hold for systematic samples from 
lists, etc. 

Example 6.4. a 

In a housing survey of a town a systematic sample from a list of all houses 
was taken with a sampling fraction of 1/50. 62T houses out of a total of 8491 
in the sample were classified as defective. What is the estimated number and 
percentage of defective houses in the town ? 

627 

Percentage defective = 100 p = 100 X = 7*38 per cent. 

Total number defective = U = 50 X 627 = 31,350 

Example 6.4.b 

If the values in Table 6.4 are taken to represent measurements on a random 
sample of 20 objects, selected from a batch of such objects with a sampling 


Table 6.4— Sample of 20 measurements 


6-2 

8*0 

8*2 

110 

13*8 

12*0 

8*7 

10*3 

8*0 

10*7 

8*5 

14*6 

7*6 

9*1 

10*1 

8*0 

10*3 

10*4 

9*3 

9*0 


fraction of 1/25, estimate the mean measurement of the batch, and the total 
of all the measurements of the batch* 

N = 25 X 20 = 500 
S(y) = 193*8 
1 

y = 2 q X 193*8 = 9*69 
Y = 25 X 193*8 = 4845 

If the number in the batch is known to be 507, a slightly more accurate 
estimate of the total is 

Y' == 507 x 9*69 = 4913 
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6.5 Stratified sample with uniform sampling fraction 

The formulae for a random sample hold, except that if the numbers in the 
different strata N/ are known, and differ from N/, the formula 6.4.b is 
replaced by 

' „ Y' = 2(N/yO (6.5.a) 

and the formula 6.4.c by 

U' = £ (N* pi) (6.5. b) 

with corresponding slight increases in accuracy in y and p, if they are derived 
from these estimates by division by N. 

Example 6,5 

Table 6.5. a shows the wheat acreages of the stratified random sample of 
1 in 20 Hertfordshire farms described in Section 3.7. Estimate the total 
wheat acreage of the county and the mean acreage of wheat per farm ( a ) from 
the data of the sample alone, (b) given the total number of farms in each 
size-group. Estimate also the number of farms growing wheat. 

Table 6 .5.a Hertfordshire farms, 1939: acreages of wheat in a 

STRATIFIED RANDOM SAMPLE OF 1 IN 20 FARMS (STRATIFIED BY ACREAGES 
OF CROPS AND GRASS) 


Size-group . 
Acres . 

No. of farms 


21- 

50 

51- 

150 

151 

o 

o 

CO 

301- 

18 

26 

20 

13 

8 

0 

49 

19 

20 

56 

72 

0 

5 

10 

14 

24 

18 

92 

0 

0 

27 

4 

30 

17 

69 

0 

0 

33 

0 

59 

32 

78 

0 

0 

4 

12 

17 

71 

• 51 

0 

9 

30 

13 

! r 76 

'48 

84 

0 

0 

0 

0 

80 

70 

0 

8 

0 

0 

16 

0 

62 

102 

5 

0 

13 

28 

36 

0 

13 



0 

5 

0 

0 

92 



27 

23 



158 



10 

22 



62 



24 

3 



0 


Size-group 1 (1-5 acres) : 22 farms, no wheat. 

Size-group 2 (6-20 acres) : 26 farms, 7 acres , of wheat on 1 farm. 
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ESTIMATION OF THE POPULATION VALUES SECT. 6.5 

The results are summarized in Table 6.5.b. The estimate of the total 
area of wheat in the county from the sample is 

Y = 20 X 2011 = 40,220 acres 

The mean area of wheat per farm is 

y = 2011/125 = 16T acres i 

The number of farms growing wheat is 

U = 20 X 54 = 1080 


Table 6 .5. b—-S ummary of sample of Table 6 .5.a 


Size- 

group, 

acres 

No. of 
farms 
in sample 

Farms 
with wheat 
in sample 

Wheat acreage 
in sample 

No. of 
farms 
in county 

Total 
wheat 
acreage ' 

No, 

Pro¬ 

portion 

Total 

Mean 

1-5 

22 

, 0 

•000 

0 

0*0 

435 

0 

6-20 

126 

1 

•038 

7 

0*3 

519 

160 

21-50 

i 

18 

5 

•278 

35 

1*9 

357 

680 

51-150 

j26 

21 

•808 

386 

14-8 

519 

7,680 

151-300 

,20 

16 

•800 

710 

35*5 

400 

14,200 

301- 

I 13 

11 

*846 

873 

67*2 

266 

17,880 

All 

- 1 - 

125 

54 

•432 

2,011 

16*1 

2,496 

40,600 


If the total number of farms in each size-group is known, the estimate of 
Y can be calculated with slightly more accuracy by using the size-group means, 
as shown in the last three columns. This gives an estimate Y' of the total 
area of wheat of 40,600 acres, and a mean area per farm y' of 40,600/2496 = 16*3 
acres. The gain in accuracy is here quite trivial, since the variation within 
each stratum is large relative to the mean of that stratum. 

The number of farm? growing wheat can be estimated similarly from the 
proportions in the size-groups, giving 

U' = 0 X 435 + 0*038 X 519 + ... = 1083 
Again the gain in accuracy is trivial. 
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SECT. 6.6 


SAMPLING METHODS FOR CENSUSES AND SURVEYS 


6.6 Random sample, stratified after selection 

The means of, or proportions in, the different strata must be calculated 
separately, and formulae 6.5. a and 6.5. b used, with division by N for estimates 
of y and p. 


Example 6.6 

Table 6.6.a shows the data, including acreages of crops and grass, for 
the random sample of 1 in 20 Hertfordshire farms described in Section 3.7. 
Estimate the total area of wheat and the number of farms growing wheat 
(a) directly from the sample, (b) by stratification by size, given the total numbers 
of farms in the size-groups of Table 6.5.b. 

Table 6 . 6 . a —Hertfordshire farms, 1939: acreages of crops and grass 
(1st column), and of wheat (2nd column), of a random sample of 

1 IN 20 FARMS (CLASSIFIED BY DISTRICTS AFTER SELECf ion) 


District 1 

15 farms 

District 3 

40 farms 

District 4 

24 farms 

District 5 

4 farms 

District 6 

24 farms 

188 

16 

370 

67 

40 

0 

11 

0 

4 

0 

i 8 

0 

60 

0 

26 

0 

28 

0 

6 

0 

312 

102 

87 

14 

192 

0 

369 

58 

221 

59 

543 

80 

8 

0 

6 

0 

48 

0 

212 

45 

31 

0 

822 

265 

11 

0 

: 44 

0 

44 

0 

153 

20 

6 

0 : 

654 

112 


-- 

4 

0 

79 

33 

287 

44 

34 

0 

3 

0 

335 

102 

614 

72 

14 

0 

28 

0 

316 

75 

158 

50 



192 

20 

465 

92 

14 

0 

116 

33 

4 

0 



10 

0 

197 

0 

4 

0 

4 

0 

68 

27 

District 7 

24 

0 

163 

0 

17 

0 

409 

102 

55 

12 



2 

0 

198 

0 

2 

0 

6 

0 

4 

0 

10 

farms 

9 

0 

78 

0 

3 

0 

115 

0 

2 

0 



3 

0 

6 

0 

7 

0 

19 

0 

192 

24 

128 

5 

2 

0 

35 

0 

6 

0 

274 

6 

4 

0 

4 

0 

120 

24 

168 

0 

335 

82 

3 

0 

491 

24 

46 

0 

58 

0 

- - 

— 

4 

0 

144 

0 

224 

28 

181 

20 

20 

0 

1,935 

141 

1 

0 

3 

0 

280 

75 

17 

0 

30 

0 



4 

0 

482 

62 

90 

0 

24 

0 

197 

6 



180 

0 

156 

28 

3 

0 

10 

0 

14 

3 

District 2 

120 

11 

302 

71 

3 

0 

36 

0 

32 

6 





— 

--: 

6 

0 

12 

0 

2 

0 

8 farms 



! 4,851 

763 

4 

0 

89 

0 

285 

29 





i 


161 

80 


— 

138 

0 

8 

0 



i 

! 


! 246 

60 

547 

25 

126 

*0 

294 

29 



1 



— 




—,— 

597 

107 





4,034 

837 



2,027 

174 

8 

0 










— 

2 

0 







GRAND TOTAL, 



200 

65 







125 farms :— 

| 15,114 

2,301 

14 

0 











262 

58 









| 


1,385 

259 
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estimation of the population values sect. 6>7 

Table 6.6. ^Hertfordshire farms, 1939: estimation of wheat acreage 

from the random sample of 1 IN 20 farms (Table 6.6.a) stratififd by 
SIZE-GROUPS AFTER SELECTION STRATIFIED BY 


Size- 

group 

acres 

No. in 
sample 

Farms 
with wheat 

Acreage 
of wheat 

" - ■ ■' ' ’ - 

No. of 
farms 
in county 

Total 

for 

county 

No. 

Pro¬ 

portion 

Total 

Mean 

1-5 

25 

0 

0 

0 

0 

435 

r 0 

6-20 

26 

1 

*038 

3 

0-1 

519 

50 

21-50 

16 

1 

•062 

6 

0-4 

357 

140 

51-150 

17 

8 

•471 

159 

9*4 

519 

4,880 

151-300 

26 

20 

•769 

762 

29-3 

400 

11,720 

301- 

15 

15 

1*000 

1,371 

91-4 

266 

24,310 


12S 

45 

•360 J 

2,301 

18*4 j 

2,496 

41,100 


: 2301 x 20 — 46,020 acres. 


iN umber of farms growing wheat = 20 x 45 = 900 

(6) 2 ,he r*" “ d 

is then calculated for each size-group, multiplied hy the "total' num^r 
esJTd total' ST g ™ 8 " 

•038 x 519 -f -062 x 357 + . . . ==860. found to be 

6.7 Stratified sample (variable sampling fraction) 

N — S (gi nt) 

Y =*V{gtSt{y)) 

? = Y/N 4* 

U = S (gi «,) 

P = U/N 

by NfeT S^Te SlhdT"™ ‘° m "^ 6 - 5 ' a “ d 6 - 5 ' b ' with d ™»n 

j / dIlu P, are slightly more accurate. 

153 



SECT. 6.7 


SAMPLING METHODS FOR CENSUSES AND SURVEYS 


Example 6.7 

Table 6.7 .a sho W s the Hertfordshire farm data for the stratified systerr^c 
< am ole with a variable sampling fraction described m Section 3.7 Estimate 
total wheat acreage and the number of farms growing wheat. ; 

1 

t „„ 1 Q^Q • STRATIFIED SYSTEMATIC SAMPLE 

rp.p.Tp a 7 a—H ertfordshire farms, Lvov . biK , 

“ *<« WITH A VARIABLE SAMPLING TRACTION (CLASS^ 


RY districts) 

. -— - -- r— -" ' T” 

. 

Size-group: 
Sampling 
fraction : 
No. in 

sample : 

1-5 

Nil 

0 

6-20 5 

1/200 

3 

51-50 

1/60 

6 

51-150 

1/20 

26 

151-300 

1/10 

40 

301-500 

1/5 

43 

501- 1 

1/3 1 

17 | 

District 

: i 

— 

— 

0 

0 30 

6 

17 18 0 

28 

172 0 92 

56 

114 

119 ' i 

107 : 

ioi ; 

160 ; 


30 16 50 

55 62 

50 49 121 

63 72 100 
186 124 105 
104 

— 

0 

10 

0 0 
40 

2 

~~ 3 

4 

■ ■— 

■— 

0 

j 0 

25 5 

10 o 

0 

0 0 77 

0 41 42 

0 24 25 

61 

67 22 5 

58 75 51 

78 94 126 

86 97 

195 

120 

17 

28 24 

8 0 

5 

42 0 24 

54 60 75 

44 6 

88 65 58 

94 115 98 

121 80 92 

18 40 120 

268 i 260 
265 260 
112 155 
240 168 
209 


0 

— 

0 0 

22 31 32 

27 0 

66 142 26 

- 

5 

— 

38 0 0 

56 17 29 

0 

0 

72 

6 

—■ 

0 

: 0 

0 0 
0 o 

19 

— 

7 

— 

— 

— 

0 0 
14 

0 60 

16 

— 

0 

27 

♦ 

214 

_____ 

1,163 

3,292 

2,925 

Total 


i * rp i 1a ft 7 k Thev follow the same lines 
The calculations are shown 111 , T , 6 foreneh sttatum must be raised 

as before, except that the samp e f ract i ons we obtain estimates of 

seoaratelv. Using the working sampling fractions we oow 

42J65 acres of wheat and 911 farms growing wheat. 



ESTIMATION OF THE POPULATION VALUES SECT. 6.8 


Table 6.7.b—S ummary of sample of Table 6.7.a 


Size- 

group 

acres 

No. 

in 

sample 

No. 

with 

wheat 

Total 

acreage 

Raising 

factor 

Raised totals 

Mean 

acreage 

per 

farm 

No. 

Acreage 

1-5 

0 

— 

— 

— 

_ 

_ 

\i __■. . 

6-20 

3 

0 

0 

200 

0 

0 

5 0 

21-50 

6 

2 

27 

60 

120 

1,620 

; 4*5 

51-150 

26 

12 

214 

| 

20 

240 

4,280 

8*2 

151-300 

40 

30 

1,163 

10 

300 

11,630 

! 29*1 

301-500 

43 

40 

3,292 

5 

200 

16,460 

! 76*6 

501- 

17 

17 

2,925 

3 

51 

8,775 

172*1 


135 




911 

42,765 

i 



6.8 Use of supplementary information in estimation 

As already indicated, supplementary information on a quantitative character, 
the values of which are known for all the units of the population, can be used 
as the basis of stratification, or for the adjustment of an unstratified sample 
by stratification after selection. Alternatively, as mentioned in Section 2.8, 
such information can be used directly without stratification. Two methods, 
the ratio method and the: regression method , are available. In either case 
only the total or mean of the supplementary variate for the whole population 
need be known (in addition to the values for the selected sampling units). 
The ratio method is simpler computationally, but the regression method is 
in certain circumstances more accurate. 

In the ratio method, the ratio of Y/X in the population is estimated from 
the sample, the estimated ratio being multiplied by the total X of x for the 
population to give the estimated total Y of y for the population. The method 
of estimation must be such that bias is avoided. As already explained in 
Section 2.6, the appropriate estimate of the ratio for a random sample is 
S ( y)/S (#) or y/x. More generally, Rule 5 of Section 6.3 will give an 
unbiased estimate, though in certain cases separate values of the ratio may 
be estimated for the different strata as described in Sections 6.10 and 6.11. 

In the regression method the average change of y for unit change of x 
(known as the regression coefficient) is estimated, and this coefficient is used to 
adjust the sample results for any discrepancy between the mean size of unit 
in the sample and in the population. 
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SECT. 6.8 


SAMPLING METHODS FOR CENSUSES AND SURVEYS 


The contrast between the ratio and regression methods is illustrated in 
Fig. 6.8. The data plotted are those of Table 6.12. The dots represent 
the x and y values of the sample points, the sample mean (x> y) being M. 
Q f represents the known population mean x of the supplementary variate, 
which differs from x by QQ f . The line OMD through the origin and the 
mean represents the ratio yjx given by the sample, and the ordinate Q' 
of the point P ± on this line, equal to (yjx) x, gives the adjusted estimate of 
the population mean by the ratio method. The regression line AMB also 
passes through the mean, and has a slope b equal to the regression coefficient. 



Fig. 6.8 —Use of supplementary information : ratio and regression methods 

(data of Table 6.12) 

This line has the property that the sum of the squares of the vertical distances 
of the sample points from it is minimum. The adjusted estimate by the 
regression method is given by the ordinate P 2 Q f of the point P 2 , and equals 
y ~f- b (x — x). 

The regression method therefore differs from the ratio method in that in 
the former the straight line which best fits the sample values is taken, whereas 
in the latter the line through the origin is taken. When the supplementary 
variate x represents size of unit, the true regression line generally passes 


156 



estimation of the population values sect . 6>8 

straight ^sion’hneS pasTngThrouS 8 ^ rCSUlt in the h ^ting 
Nevertheless, in most census work in Su ^ ^ d ° Se t0 > the origin 
closely correlated with size, the ^ter^mnr /Tr* 8 “** or some variate 
any small gain in accuracy resulting from tb< l rat ‘ 0 met hod outweighs 

It may be noted that fo ? a g , the use of regression. 

by grouping the data according^ ThT^ta Ws^d^ ^ Can be plotted 

for the different groups. g and P Iottm g the means of y 

it is expected thef^ilPbe™ comity ^8^^ “ thiS b °° k ’ n<>t because 
the regression method represents an imnnrf m Cen ® us wor ^ but because 
without which no account of sammlin^ a! a*** Sam P bn g procedure, 

because the calculation ofthe 3 " S e?. * 7"“ be “”P I «c. 

18 subject can only be made byTe o?,h7 Wh,<!b a bal "“<‘ a ™ple 
If the population mean JZ ■ he re 8 ress,0n concept, 

phase of two-phase sampling th ^ f Stlmated from observations at the first 

”“7 Ph^/dte sartiS'oJ^rr l° r ,f b r 8 ° b “ i »' d « £ 

substituted for S.« If however .k„ m hoId . «be estimate x, beintr 

fmm the sample is substitutedfor?fcio™ S |'' Pi ““- 1 * nd "* estimate x 
random sample without supplementary infor^T 3 ? = y a PP ro P r iate to a 
words, there is no gain in aecumcv in Ihl 'T- 18 ° btained - In other 
known or alternatively is estimated from a W P °P ulat f n estimate unless * is 
In addition to their use in the adiustm Sample than ls available for y. 
mean or total of y when ° f ^ 

or is determined from the first phase ofLn i^ 0 '™ f ° r the P°P ula tion, 
regressions are of use for the purpose of oLn P3SC Samplin S> ratios and 
J for some standardized value o/r Hll !! 8 f 8tunat f of the means of 

the population can be made, freed from^hT ° nS ° f different P a rts of 

values of *. I n the case 6f ’ ™ the effects of variation in the average 

of ratios themselves: thus in an agricultural™ 6111 t0 COm P arin & the values 
quantities as number of sheep per 100 , T We may c °nsider such 

farm. Regression of ™"\ber of sheep per 

rhe ratio methocj is inappropriate • in a nutriri " t0 ^ ““f® m Cases in which 
well be found that the amount of’maln, f v ' SUrvey> for example, it may 
the relation will not be ^ ^ ^ of Lil r, J 

standardization ofthis typecanequallvwell^ krg ®' S f le sur veys, however, 
means, thus avoiding the troubled r^li^ made by usm £ the size-group 
„ The formulas in the following secf-it^ 61 ^ 3 * 1118 re ^r ess * on coefficients. ° P 
Formulas for the proportion of ufots posTessf 6 ^ ^ 2 , quantitative variate, 
by scoring each unit as I if it has the attrilf t attnbute can be derived 
supplementary variate * represents ize fo, ° therwise - If the , 

7 xrr 




SECT. 6.8 SAMPLING METHODS FOR CENSUSES AND SURVEYS 

Bxampk 6.8 AtetawJ Farm 

The data of Table 6 • 8 are extracted tomteW z J ¥isheries> 1946, G). 

Survey of England Wales ^re of Irops and grass, classified by size- 

They give the rents of holdings p Calculate ^ ents per acre standardized for 

STof * which the different size ' groups occur 

in the whole country. 

Table 6 .8 -Rents per acre for Berkshire and Cornwall 


Size-group 

acres 


Rent per acre (shillings) 


Berkshire 


Cornwall 


Proportionate areas of 
. different size-groups 
in whole country 


100-300 

300-700 


Overall 


Tbe proportionate areas for the whole country '“cightT^ 

btffx t + ^2 = 26 shillings |er aore, and that for Cornwall 

is 25 shillings per acre. the ove rall rent per acre is 

Inspection of the table shows 'that alt:there is little difference 
considerably less for Berkslnre than ’ the rents for Berkshire 

proportion of large farms in tliat county. danger of standardization of 

This example illustrates both t e effects of differences in average 

this type. The standardized rents ehmmate ff concentration 

size onthe average rent, which injo far as f m smaller ho i dingS) 
of buildings on the smaller holdings, g be incorrect) 

etc., do not represent differences in the two counties 

SHS r S owan" e er a'cm would be the same. Part of the difference 




ESTIMATION of tu» „ 

is due to the tend OPCI ' ATlON values 


Ratio method ; ranfl „_ 

random sample 


J 7 „ 

: r x 


(6.9) 


The formula for 9 ^ 0 ) 

Yo ofy fo r a standard value t" T* ** ° bta ' ni «g the « 8tanri 

Estimate the tot I 

273,^^ e a c°^,’ ^^H^^^^tota^are^of crops 3 and°gra^™^ 6 !,^ 

Wheat in ^ple = S(J - a " 4 3Cres 

Es ^te of wheat acreage ^ Jl ^ **• 

15^14 X ^074 

Example 6.9.b = ^ottcref ^ 

Table 6.9 gives 

R, ra ? d ? m sample of tteaaJT° f , PerSons belonging to 43 t , 

* raK ° Ns «nr p “ ° f ««i»: tot„ „ 

; , , 7*>. *.«. OT CLr7“ 

7Q , ? 89 * V ’ ^ 


95 
79 
30 
45 
28 
142 
125 
81 
43 
53 
148 


y 

18 

14 

6 

3 

5 

15 

18 

9 

12 

4 

31 


89 

57 

132 

47 

43 

116 

65 

103 

52 

67 

64 


:V 

7 
9 

26 

7 

17 
24 
16 

18 
16 
27 
12 


75 

69 

63 

.83 

124 

31 

96 

42 

85 

91 

73 


y 

12 

16 

9 

14 

25 

3 

45 

25 

35 

28 

13 


159 

54 

69 

61 

164 

132 

82 

33 

86 

51 


36 

26 

27 

2 

69 

41 

10 

8 

22 

19 


159 


3,427 799 



, 10 «*«»■«— « — A ;;^ ey - 
percentage of P er8< ®® * nd absent from the reserv 

belonging to the reserv _ _ J?®. x 100 - 23-3 per cent. 

Percentage absent = 100 F 3427 

3_2_5 x 3427 == 25,902. 

g^ 0 '^ Y ^ x 799 .eo 3 , 

Number absent from reserve ^ ^ ^ = 19 ,S63. 

Number ,— ■»^' „le M*J "*£&■ 

This «x»»ple ‘hougb»^ g ooiB co^toflrmaUSection 1 
different, m * at ^ i of t he sampling error t 

This affects .be —» Prado. 


This affects the estixna ung fraction 

„ .. sttatifi ed sample «>«>»>“<>■“ Si “' P ' ,ng 
6.10 Ratio metho • ^ ^ same for all st rata . 

W Wh “ *' f “* mXT'sample hold. different 

The formulre for a ran ^ ^ e different values for 

(6) When the rabo ts pe fo[ a random sample, 

strata: * using the formulae estimates for 

ja^riSrrH - srr — 

the separate strata, witn 

This gives {Si{y)-\ (6 ' 10) 

~'= :a lsr® ‘J 

etc. i j ( a \ a ud method (b) depend 

?^s=^^A=3s3flfi 

numbers of un determinations of t is correlation 

give reasonably a numbe rs are small a " d his ob j ec tion does not 

separate strata. “ ™ th od will be biased. (1J» J ■ US ed-see 
be^een r and *, theprobabilities proportional to 
hold if selection with p 

use of method (b) ; * the v 




, , rU ™LATION VALUES SFrr 71 

(3) Computational convenience—methnH / x • . U 

value of the ratio is involved. * 1S Slm P ,er > sinc e only one 

Example 6.10 ’ 

ratio method^slradfyin^the^dat^bv d°T'^ ^ ° f Exam P le 6.6 by the 
the ratio for the different districts.^ d CtS ’ and usm g different values of 

Table 6.10— Hertfordshire farms urn . 

from the random sample ofT ; n 20 FARMf MATION OF WHEAT acr eages 
stratification into districts BY THE RATro met hod after 



Sample 


No. 

Wheat 

S e ( y ) 

Crops and 
grass 

s < w 

Ratio 

n 

15 ' ' 

!. 141 I 

1,935 

-— ,.. 

•0729 

8 

259 

1,385 

•1870 


Estimated 

district 

wheat 

r,x, 



1 882 

•1440 

40,905 

5,890 

2,027 

-- -- 

*0858 

•--—-_1 

34,437 , 

2,950 

15,114 j I 

273,074 

43,010 


of St. AlbS^Zdwi^di^ l?Jb ofwhiI he nd ? hb0Urin g districts 
number of farms) have been combined C ° ntams mher a ®11 

^ rT T h0d 1 Stratified Sample with liable sampling fraction 

(a) Ratio the same for all strata : 

r — 

2 {gi Si (x)} 

( f =rx 

Y = rX 


UBfi/wy 



(6.12.a) 
(6.12.b) 


S1CI . ,. 12 S.MPUtto MBTHOOS POP ^ 

(» *»*> f “^fo““=dPP"'^ f “' i “' T1 “’ a '“ i>line 

Proceed to the »«m delations, 

fraction does not enter int 

612 Regression method: random sample 

The equation of the regression lme is 

H yi== p+b(x-x) 

where , 

■ b — S(x-xf 

Hence ’ y = y J r b{x — x) 

If N is not known exactly, it 6 .9 is obtained, and if 

Note that if b is put equal to 5 00/ ( ) values of b wi n give unbiased 

out equal to 0 formula 6.4.a is obtained. All^ ^ appropnate to the 

estimates, and consequently ^valujio taking b ^l is equjvalem m Ae 

roS:—sT-- t ^ 6 - i2 - b> 

in two-pka se sampling 

Regressions may be used ior 
in the same way as ratios. 

Example 6.12.a total of wh eat from the data of Example 6.6, 

Obtain an estimate of the total a 
using the regression method. 

WehaVe . 1S 408 N = 2496 S = 109-405 

S = 120-912 y- l * m SOT-207,« 

5 = 5,061,734 s( ) = 278,219 'S «00 = 

JS( ,) = W«4 ,SV-*S W s& _^.,»4, 9 0* 

-N 2 a 234 270 s (#-*)(y -y) = 624,739 ^ ,. . 

S{x-x) = 3,234, 1 and products is explained 

The method ofeataWon of *esum» J rented to the ealculatton 

to Section 1 . 1 . The .ton of *pm °> f 

of the sampling error. 

We then have 

_ 624,739_ ^ 0 . 19316 

7 . _ in ioc( in 4.00 acres 


l 



..XM 

located plots^of T/Tn ^ measured volumes of f u 

3Cre ’ and Cye est lmates of th^olum” 25 Syste maticalf 
6.12_]vr EASlIRED Volumes ' V0Ws P- V 10 acre h 

* OF m esximate; 

J * y PER VlO ACRE) 


l r 

m 


y 

170 

47 

64 

91 

126 

146 

87 


102 

14 

57 

70 

95 

92 

110 


195 

255 

135 

146 

154 

110 

112 


2081 
208 / 
1101 
110 I 

110 j 

110 J 

128 


:V 

153 

216 

125 

100 

287 

261 


% 

79 

177 

65 

196 

167 

268 


169 

182 

74 

24 

255 

35384 


1521 
152 f 
148 
2 07 
167 
C302 


th * » 5,302 

m Figure 6.8. The total a 6 19 ^ 8 ' 9 Cens us of Woodlands '!2 Countles . were 
two counties was 5m Tl C ° nifer stan ds over20 Th f ey are P^ted 
estimates of all th& a crts, and the total vni ^ ars a £ e in these 

unbiit^tir 6 ' iw - ow5 "“ * t: ^ 

above «ta». ° f *e total volllme „ f «. ft peracre. 

H7-36 X 10 v conse quently 

T he ratio method gives ' “ 7 ’ 551 ’ 000 <*• ft. 

* l473 ' 6/132 °- 8 - M14.000 cu. f t . 

(eqSenfm 77^/^™* ^ tdmaTes^on^ ** difcren «‘ 

° ~ 1} glves tegression method with an arbiLry^oeffiden* 

5124 ( 1 473-6 - 1320-8 + n 92) 


Jhe regression method nr * 6 ’ 891 ’ 000 «*• ft- 

coefficient. We find ^ ^s the calculation of the rem-essmn 
S (y-W= iiko»» ~. e 0 ression 


~f) 2 — 115,266 S(y _ ^ / v .. 

^ = 52,069/82,296 ^0-63270 

124 ( 14 73-6 + 0-63270(1192 

163 


52,069 (a 
- J 320*8)} 


- x)* = 82,2.96 ^ 


?33,000 cu, ff 



act. 6 13 M ‘™ 0DS M — s of estimation is disced 

The relative .curacy - *« — _ M ^ to , the survey, 

in Example 7 IT ' f ours , only » “"“'l 1P “ w* two counties gave a value 
The of the data for "* low, though not very 

STS'%**:-rsft a rj:vr M 

STSTrger *» **« ^ 

sSdoS; rr^sam^ s thc totio „ -*.tsx 

„ 0 , „«n "b^STof trees ^ “"^1 theTlves U biased f 
made, e.g. by the hat the sample P lots w ‘ u and the use of 

demarcated samp e ® ^ ^ ^ were somewhat^ ^ second stage 

The sample plots u Q f hardwoods, p ? f bias 0 f this 

ZZttoZPZ-fjr “T”«T TfiTSey Of Eng- and 

this cause en^TSness of the earhet survey. 

Wales confirmed the uniform sampling 

. method : stratified samp 

6.13 Regression metn all 

wW ir“reg,es».nc«»iae„,isas„med,.he,h« ^ 

S " The formulae for • ™dom sample hold, except that form 

replaced by s {Sijv _—yd ( x _Tii 

i= S {a (•-*')*> fferent val »es in the 

• • emitted to assume different 

{b ) When the regression is perm 

different strata: met hod. 

Proceed as m th variable sampling 

• method : stratified sample with v 
6.14 Regression meth 

(a) Regression the same ft* a_fi strata_ ^ 
where £{h 

b = -—WhSrfx-iitf} ' 
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w kh a similar expression for y w and g w are the estimates of y and x that 
ri d (s b e e et3£ thC SamplC ^^ ^ n ° information 

n»J12^ re81 ffl^ OP8 W u hin f rata 3re truly linear ’ with identical values of the 
gression coefficient, then the most accurate estimate of b will be obtained 

mversdy P r °P°rtional to the residual within-strata variances 
of j about the regression lines. If the regression coefficients are different for 

eouahtvTf 111 Strata ’ then i he com P° nent of er ror due to the assumption of 
quality of regression coefficients will be minimized by taking Xt proportional 

• £'f y Se f ° f \ wil] S lve a virtually unbiased estimate of y, and detailed 
investigation of the theoretically best values to adopt is seldom worth while. 

St W S h may b ? taken as unity i f aI1 ^ strata contain material of 
similar variability, i.e if the variable sampling fraction arises from extraneous 
auses not connected with the variability of the material, and equal to g t if 
he sampling fractions have been chosen so as to minimize the sampling error 
Under certain conditions Xt = gt 2 would be best in this case, but under other 

fractions^ ^ ''° U ‘ glVC excesslve wei g ht to the strata with small sampling 

(b) Regression different for different strata : 

Proceed as in the ratio method. 


6.15 Use of regression to calibrate eye estimates 

It sometimes happens that eye estimates or similar subjective measurements 
k . ™ ad t.° n a P ro perly selected and unbiased sample of the population 
but that the objective measurements y, which are required to calibrate these 
estimates, can only be carried out on a non-random sub-sample of the original 
sample. The eye estimates cannot then be used as supplementary information 
ih the manner of Example 6.12.b, since any bias in the sub-sample used for 
the objective measurements would be reflected to a greater or less extent 

depending on the value of b, in the population estimate derived from the 
regression. 

In this case the regression of v on y, instead of y on x, must be calculated, 
and the equation of estimation must be replaced by 

y = y + y — *)> 

when b’ is the regression coefficient of * on y, y and g are the means for the 
sub-sample, and x, is the mean of the eye estimates for the original sample. 

This procedure is subject to certain limitations. Firstly, the sub-sample, 
though non-random for the whole of the population, must be effectivelv 
random for units having any given value of y. If, for example, there is a 
tendency to select units which, for a given value of y, have high values of x 
serious bias may result. Thus, in a crop-estimation scheme, if eye estimates 
are made on a random sample of fields, and if reliance is placed on returns 
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by farmers of the actual yields of some of these fields, any tendency on the 
part of the farmers to return only the yields of fields which have turned ou 
better than their appearance would indicate will lead to an overestimate of the 
yield. On the other hand, the omission of a greater proportion of the low- 
yielding than of the high-yielding fields from the sub-sample will not bias the 
results, provided this omission is conditioned only by the final yield and n 
bv the previous appearance or the value of the eye estimate. 

Secondly, for accuracy in the final estimate, the eye estimates must be 
reasonably accurate in the sense that variation about the regression line must 
be small, and the line itself must have an adequate slope. If the regression 
line is curved, this curvature can only be allowed for in the estimation fonnu a 
if the variation about the regression line is negligible. Otherwise bias will be 
introduced. The use of the best fitting linear regression line, however, will 
avoid this source of bias. 

Example 6.15 

In order to test the accuracy of eye estimation as a method of estimating 
the yields of cereal crops shortly before harvest, a trial survey of the wheat 
crop of Hertfordshire was undertaken in 1940. Two observers were employed, 
one of whom visited 47 farms, observing 110 fields, and the other 16 farms, 
observing 37 fields. The whole set of farms constituted a systematic sample 
of 1 in 12 farms, excluding those growing less than 5 acres of wheat in , 
a random sub-sample of fields being taken on the larger farms The actual 
yields, as determined by the farmers, were subsequently obtained for as many 
of the observed fields as possible, and these were used to calibrate the eye 
estimates. The relation between the eye estimates, x, and the actual yields 
per acre, y, for the first observer are shown in Fig. 6.15 for the 37 fields for 
which yields were obtained. Obtain an estimate of the mean yield per acre 
for the part of the county covered by this observer. 

The regression coefficient, V, of * on y, calculated from the unweighted 
values of * and y for the 37 fields, is 0-6926, the regression equation being 

= 30-00 + 0-6926 (y - 28-78) 

This is shown by the full line in the figure. The dotted line represents the 
line that would be obtained if there were no errors m the eye estimates. It 
will be seen that there is a tendency to underestimate high yields and over¬ 
estimate low yields. The other observer and the farmers gave very similar 
results 

The mean of the eye estimates for the whole of the first observer s 
sample, and that for the eye estimates x and the yields y of the fields for which 
yields were available, weighted according to the acreages of the fields with 
an additional raising factor if all the fields on the farm are not sampled, are, 
in bushels per acre, 

= 30-12 * = 30-13 y = 28-95 
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Hence, since 1/0-6926 == 1-444, the final estimate of the yield per acre is 
? = 28-95 + 1-444 (30-12 - 30-13) = 28-94 
The adjustment is here negligible, since x 1 and x are almost identical. 



Fjg. 6 . 16 — Relation between eye estimates of the yields and the actual yields 
OF 37 FIELDS OF WHEAT IN HERTFORDSHIRE 


6.16 Sampling with probabilities proportional to size of unit 

(a) Size, x, of all units of the population known, or X known : 

. In thl f , case * acts as a supplementary variate, and the ratio method will 
in general be appropriate. Since the probability of selection is proportional 
to * raising factors proportional to 1 /* must be introduced into the formula: 
already given. This leads to the formulae 
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In other words the unbiased estimate of the population value of the ratio is 
given by the arithmetic mean of the ratios from the selected sampling units. 

(b) Total size X of the population not known : 

In this case X, as well as Y and f, have to be estimated from the sample. 
Selection has to be made by some such process as randomly or systematically 
locating points on a map, and points not falling in the units under consideration 
must be taken into account. If n 0 is the total number of sampling points, 
and A is the total area covered by the sampling grid, we have 


X = A njn 0 
Y = r X = f A n/n 0 

Alternatively, if A is not known exactly the density d of points per unit area 
may be used. We have 

A = n 0 /d 
X = njd 

If the sampling is two-phase, with n,' points (density d') at the first phase, 
of which ri fall in the units under consideration, n, n 0 and d must be replaced 
by and d’ in the above formula. 

Example 6.16, a 

In a survey to estimate the area and yield of a crop, systematically located 
points at a density of one per 4 square miles are taken, and the yields per acre 
of the fields in which the points fall and which carry the crop are determined 
by the harvesting of small areas. 8317 points in all are obtained of which 
529 fall in fields carrying the crop. The arithmetic mean of the yields per acre 
of the selected fields is 15-7 cwt. per acre. Estimate the total area and yield 

of the crop. 

A density of 1 per 4 square miles is equal to 1/2560 per acre. Hence 
area = X = 529 X 2560 - 1,354,000 acres, 
yield = Y = 15-7 X 1,354,000 cwt. = 1,063,000 tons 

Example 6.16 .b 

If, in addition to the yield data of Example 6.16.a, a further 24,938 points 
were surveyed for type of crop only, giving an overandensity^ththe 
8317 points of the yield survey, of one point per square mile, and.1673 of the 
fields so located were found to carry the crop m question (in addition to the 
529 fields above), obtain revised estimates of the total area and yield of the c p. 

This is an example of two-phase sampling. The two sets of points together 
constitute the first phase, for area of crop, and the 529 points for which yield 
samples were taken constitute the second phase. 
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We therefore have 


n 0 


24,938 + 8317 -- 33,255 

and = 1673 + 529 - 2202 

Consequently 

• t at the first phase is 1/640 P- acre. 

The density at th , 40 = 1 409,000 acres 

rea = X = 2202 X 640 , \ 06 ,000 tons 

T, _ Y = 15-7 X 1,409,000 cwt - 

• >ie wtities proportional to 

617 Samples * r ■. * 

size of unit ^ be knoWn . As P^^^ata, 

Tn this case the sizes of all from so me or all be 

SechontlO,. if *> ^ 

the same unit not 8 d sUg ht bias wi stratum, giving 

^rsssf ^ - — d separately 

equations of estimation i (y\^ f . 

Fi== «i W 

Y = S{fiXi} 

p = YiX 

*-* h 6J \ ^ of and ■»“ 

b SSSS^ - wo - de - n „ c0MBiNffi 

parishes selected 

to SIZE 2 I ^ _ — --— 

^ 311 228 249 686 

Wheat • • 1 164 1 , 440 2 040 12,370 3,330 2,290j*^ 

■» » i 7 

Ratio • l 5 ' 

4 

District 



Ratio 
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Table 6.17 5 p ' < ® Bra ® ™ s™ Kys 

o—E sTJMATJON of Whe . t „ 

r__ _ OF Ta ble 6.17 a CReage fr °m the data 


District | ^f ean ra tio T 7~T; " i ' ■— 

I “d grasT I Cro P s an d e grLs I E ac™ ated 

- —_______ In district acreage 

I —- L I of wheat 


22,932 

43,591 

57,263 

73,946 

24,964 

34,437 

15,941 


273,074 


1,120 

10,200 

7,790 

15,230 

5,890 

4,920 

1,510 


further ^mputaTont fo? whtat af^ 68 ^ shown in Table 6 

Sam T ^ computations for numll /f ln Ta ble 6.17 b ' J7 ' a > a «d the 

f aken P ro Portional to size f/° babl K v > but when the troh F/r° n ° f P aris bes 
be ^en for all variate*.““ ° f Un "’ tbe «*> to size, 

^•18 Multi-stage samplin 

;T---e d „: ; orab " e ■" - «°* s 

■2 Slt“ch b V‘ e 

P«c„, ar sub.^,. »nd d* second J ™ 8 '"f /*«or of g 

8 factor /' of the 

^ ^ general f 0rmuJa fo r 

n o supplemental^ information, 

where the sum™ 4.* . ^fev) 

all units , wi(h 

mkr formula, for N, etc. 
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If the combined raising factors for a group of units are equal then the 

In^artST 8 "; 11 „ b r mpIl 5 ed by SUmm ‘ n ' g these units before multiplication. 
treVlf ’ lf 3 h f combmed raisin g fetors are equal, the sample can be 

■Sol 01 PU f P ° Se l° CS ‘ lmation as if k were an ordinary random or stratified 
random sample with uniform sampling fraction. 

6.19 Multi-stage sampling with supplementary information 

(a) Ratio method, ratio the same for whole population: 


7 


S ( gy ) - 


etc., where g ~ g' g". $ (gx) 

(b) Ratio method, ratio different for different parts of the population • 
Many variants are possible. All can be resolved by proceeding stage 

bfaslfthe 7 h m ? h ° dS aIrCady OUtHned The danger of introducing 
recognized. ^ ^ °” the ratio8 are based is sma11 must be 

(c) Regression method : 

whVh greSS1 !u S WiU USUally be em P lo y ed at the first stage of the sampling, in 
Ianner CaSe coefficien * or coefficients will be calculated in the 

values ofT?^ ? thC , tyPe , ° f SampHng ad °P ted at this ®tage, using the 
stage samplfng° ta S ^ ^ y f ° r 6ach mam Unit estimated from the second- 

can VVVT iS t USed at th ® second sta S e the procedure for stratified samples 
n be used, treating the selected first-stage units as if they were strata!^ 

(d) Sampling with probability proportional to size: 

witlfin sTratf 311 ^ 386 W “ Whkh the drst " sta g e units are sampled from 
sTmnhnv Wf P robabdlt y proportional to size, and the second-stage 
In this S r are f Ch ,° Sen S ° aS to glve a uniform overall sampling fraction. 

proti Sect on h ST ? at the ** Stage of tbe estimation 

LVhe\VT } Wdlbe f0und t0 be equivalent to the direct estimation 
trom the second-stage units by means of the overall raising factors, i.e. y = S (gy). 

Example 6.19 

Fertilizer Prart'^ ^ I' 19 ' 1 ’ obtained in the cou rse of the Survey of 

SfLt ~N:rf* ma,e ,he avera ®'° i ■>» 

SectVn rVV; am V g P ro , cedure of this s 'urvey has been described in 
V" 4 - 2 f- fnformation is lacking from a few farms, mainly owing to 
hanges in tenancy. Since this affects the small farms to a greater extent 

been usecW™" ^ adjl , 1Sted sam P ling Actions shown in the table have 

is avafiabt IVaV !f al t0 , the number of farms °n which information 
vided by the total number of farms in the size-group. The 
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second-stage sampling fractions are given by the reciprocals of the number 
of fields, since one field is selected on each farm. The combined raising factor 
for the sampled field of the first farm in Table 6.19. a is therefore 105, that for 
the sampled field of the third farm is 105 X 3 — 315, etc. 

Table 6.19.a— Survey of Fertilizer Practice: data on the application 
of nitrogenous manures to sugar beet on old arable land in Norfolk 


No. 

of 

fields 

. Cwt. 

Acreage N 

No. 

of 

fields 

Acreage 

Cwt. 

N 

No. 

of 

Acreage 

Cwt. 

N 

Total Sample 

Total Sample 

per 

acre 

fields 

Total Sample 

per 

acre 


Small farms 


Medium farms 


Medium farms (contd.) 


1 

1 

3 

2 

1 

1 

1 

1 

2 
2 
1 
2 
1 
2 
2 
1 
1 
1 
2 
1 
1 
1 


2 

6 

5 

4 

5 

3 

4 
2 

6 
8 
2 

4 
6 

5 

6 
2 

4 

5 

3 
7 
2 

4 


2 

6 

2 

3 

5 

3 

4 
2 
2 
4 
2 

3 

6 

4 
4 
2 

4 

5 
2 
7 
2 
4 


•68 

2 

13 

10 

•42 

1 

12 

12 

•16 

•63 

4 

40 

5 

•63 

1 

6 

6 

•30 

•55 

1 

8 

8 

•15 

2 

14 

7 

•42 

•42 

3 

14 

4 

•42 

3 

21 

6 

•14 

•36 

2 

51 

11 

•63 

1 

8 

8 

•42 

•21 

2 

16 

10 

•90 

2 

10 

2 

•49 

■21 

3 

19 

7 

•21 

1 

4 

4 

•30 

•42 

3 

31 

10 

•84 

1 

4 

4 

•30 

•42 

3 

39 

4 

•63 

3 

14 

7 

•45 

•90 

2 

19 

13 

•52 

6 

42 

10 

•36 

•52 

1 

9 

9 

•52 

1 

6 

6 

•21 

•15 

1 

20 

20 

•42 

1 

4 

4 

•30 

•70 

5 

26 

7 

•30 

2 

19 

8 

•42 

•36 

2 

8 

6 

•54 


Large 

farms 


•56 

4 

20 

8 

•21 

1 

8 

8 

0 

0 

1 

4 

4 

•42 

1 

48 

48 

0 

•21 

1 

20 

20 

•72 

3 

19 

4 

0 

•48 

2 

7 

4 

•36 

3 

56 

24 

•68 

•10 

4 

32 

11 

•63 

2 

22 

5 

•36 

•32 

' 2 

16 

8 

•82 

4 

30 

5 

•42 

•80 

2 

6 

3 

•63 

3 

20 

5 

•90 

•30 

2 

20 

10 

•56 

6 

126 

29 

•75 


1 

7 

7 

•57 

6 

28 

6 

•63 

without sugar 

beet on old 

arable 

land : 

small (6-50 acres), 8 


small, 1/105 ; medium. 


medium (51-300 acres), 11 ; large (301- 

Sampling fractions (adjusted for absence of information) : 

1/59; large, 1/30. 

The average dressing of nitrogen must be obtained by calculating the 
raised total of the amount of nitrogen applied S ( gy ) and the raised total of 
the acreage sampled S ( gx ). The amount of nitrogen applied to a field is given 
by the product of the acreage and the rate per acre. The three size-groups 
are best kept separate in the computation. For the first size-group, therefore, 
applying the second-stage raising factors, we have 

S(g"y) =S(g"rx) = 1 X 2 X 0-68 + 1 X 6 X 0-63 + 3 X 2 x 0-55 + • • • 
= 1-36 +3-78 +3-30+ . . . 

S(g"x) = l X 2 + 1 X 6 + 3 X 2+ ... 

= 2 + 6 + 6 + . . . 
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This gives the results shown in Table 6.19.b. Applying the first-stage raising 
factors to the total nitrogen and total acreage, we obtain the average dressing 
of nitrogen per acre : 

_ 46*13 x 105 + . . . 28,512-04 

r 104 x 105 + . . = 58,229 = ' 49 ° cwt per acre 


Table 6.19.b—E stimation of average dressing from raised results 


Size-group 

Total 

nitrogen 

s (g"y) 

Total 

acreage 

S(g"x) 

Nitrogen 
per acre 
S{g"y)!S{ g " X ) 

--l;. 

First-stage 
raising factor 
g' 

Small 

46-13 

104 

•444 

. . 

105 ; 

Medium . 

285-41 

601 

•475 

59 

Large 

227-64 

395 

•576 

30 


The data presented comprise only a small part of the information collected, 
and the above method of estimation therefore demands a good deal of 
computation. For certain purposes comparative figures may be obtained 
from the straight averages of the dressings per acre on the sampled fields. 


Table 6.19.c—Estimation of average dressing 

FROM UNWEIGHTED MEANS 


Size-group 

No. of farms 

Nitrogen per 

acre 


Sum 

Mean 

Small 

' 22 

9-30 


•423 


Medium . 

36 

16-32 


•453 


Large 

9' 

3-74 


•416 


All 

67 

29-36 

•438 


These averages are given in Table 6.19.C. The first-stage raising factors 
have not been used in this calculation, since the larger farms have more and 
larger fields in sugar beet, so that inequality in the omitted second-stage raising 
factors more than compensates for the difference in the first-stage raising 
factors. 
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It will be noted that the mean dressings are less than those previously 
obtained for all size-groups, indicating the possibility that farms with little 
sugar beet, which are overweighted in the straight averages, are using less 
nitrogen per acre than those with a large amount of sugar beet. The data are 
too variable, however, to determine with certainty from this sample alone 
whether this is really a bias or is due to random sampling errors. The large 
difference for the large farms, for example, is due to farm 8 having a very 
large acreage of sugar beet. The relative accuracy of the two methods of 
estimation, apart from bias, is discussed in Example 7.17. 

In the Survey of Fertilizer Practice the second method of estimation was 
used in investigation of secondary points, e.g. comparison of different types 
of farms. For the more important estimates, such as mean dressings per acre, 
a modification of the first method was used, the total acreages of sugar beet 
on the farms being taken as the raising factors for the second stage. This 
method of estimation is slightly more accurate than the first method given 
above (the actual gain in accuracy is discussed in Section 8.9), but will 
be biased if there is any tendency for farmers to apply heavier (or lighter) 
dressings to their large fields. There is no evidence that any appreciable bias 
does in fact arise from this cause, but even so it is perhaps doubtful whether 
there is much advantage in using this method of estimation rather than the 
unbiased method given above. The method would have been unbiased had 
selection of fields within a farm been made proportional to area, but this would 
have demanded somewhat more elaborate methods of selection in the field. 

6.20 Systematic and balanced samples 

The methods of estimation described in the preceding sections are also 
appropriate for systematic and balanced samples. Samples of these types 
without other restrictions, for instance, can be treated as if they were random 
samples for the estimation of the population values (but not for the sampling 
errors) ; if there is stratification the procedure for stratified random samples 
holds. An example of a systematic stratified sample with variable sampling 
fraction has already been given (Example 6.7). 

Certain estimation processes are naturally inappropriate to systematic and 
balanced samples. If the process of selection in a systematic sample is such, 
for example, that stratification is automatically introduced, there will be no 
gain from stratification after selection. Equally in a balanced sample the 
variate for which balance has been effected will be of no further value as 
supplementary information—the balance ensures that the corrections based 
on regression, or ratio, will be zero, whatever the value of the regression 
coefficient. If each stratum is balanced separately, then the corrections for 
the different strata will all be zero, even if the regression coefficient or ratio 
varies from stratum to stratum. 

In systematic samples of material which varies in a continuous manner, 
some gain in accuracy may result from the use of what are known as 
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Se'cha’^el’Stf T "£*£,"*** •"*** 

^Mo°Ss7„ch/J^ ® a tket?o» d ^ s “»p'« vaJ„« 

by ■*:■ "■ i and 

are omi,ted ' ,h “. *»■ “ L e d„Tr m o P r‘»' md - »r h , e 


* = te' + px" 

y ~y' -\- b(x —x') 

=? + fib (x" 


-- Jh« c h , 

The calculation of thp • ^ ^ (1 — b) (%" ___ ^ 

“ b “ ed on ,he ^ of * ** 

clo^as Tmely\obe\h TTehti °^^ the diffe «nces 
be nearly equal to nnt ^ ,f SC when this type of s ' n r W ° ° CCasi ° ns is very 

7 -K d e Z: d lZ\ and ^ CStimate «fS^eSft "r^’ * ^ 

the estimate of th** , e uni * s deluded on both ^ • Gr ^ e from that 

i>y adding me “ ot *»« »iU diffeST 0 ” ,y ' 

,Whe/a .It”? *> * —a 2L?t'° b “”' d 

replacement, a fraction f 7 SI2e 1S taken on each I h 1 occas >M. 
retained, the sample units^ h™® be ' n S replaced and aTaV" V^ 1 
estimate y 1 of the population £“* 310 retained can be used T * being 
method already give!! r ^ an ° n the second occasion 7 ? furnish an 

estimate^, equal toy" dJrivaW f° n there wil1 b e a further’ e , regression 

on the second occasion ’i 4 from the sampling dependent 

by a weighted mean of ^h The m ° St accurate estimate y W 'ti inc,uded 
Ml -ph*) and un ZM tW ° estim ates. The Zr be P rovided 

between the unit C i ^ ^ " ,« 2 r 2 ) where r io .u orre ct weights are 

co L he J c rs ; 4-7 r s ' 0 ~ -**« 

^ exception ZuZelfZf-f ^ Tbmh "oc^ regreSSi ° n 

we divide by the corresn ,v * C IVldln g by a quantity of n, occasi °ns, with 

«* a y pa,Ts:„^x %d,th ' 7 e 77 h 7 r 77 ^ 
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» 7T ” d — 

llX/a iL_ 1 


We thus have 


Yw 


- ^ 2 r 2 


^ (1 — ^r 2 ) 


/ i 2 r 2 


{/ + z> (x — «')} .f A-^^. 

\ 1 — ju 2 r 2 J - 


- ■ ' -- 1 v‘ _ n=^vv 

above tW0 0C “ « not the sa rae the 

?w = «'{/+*(*-<» + n»(l- pr*) y" 

n' -)- n" (1 — ^r a ) (6.21. a) 

«™ whi c „ „°d ~ pled -*■ fat 

™» oT^ZolZ^f- Z-Z“t Z;^l d by “if 8 "? ™ ghttd 

«o «** estimates are W - „,) and > (1 _ ^ 


t(l ~r) 


(y" — x") (6.21 .b) 


Change = f ~- (/ -x') + B± 

l — jttr ^ -/ 

of t™ 2r 8 s ,t ,h 7 iff r“ 

that once the sample for the second occasion has been taken a ^ 18 

S^atio^ k P° SS * le ^ ’S’S 

information. If this revised estimate c • OCC , asion as supplementary 

change given above will be very nearly equll to “ Icdated ’ t J en th e estimate of 
arises from the fact that „ r ,l,L *u/ ec l ual t0 Y™ ~*w. The slight discrepancy 

the estimate of t **° "*^ 

equal toy-^ Tth* ^ ' “ T* t0 J ’ the ^ovTelmT of changed 

7£^‘^jz?£2£ sr iszrr »f : 

w^StSKSSss 

ss s»ss?=-*«nH 
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SECT. 6.21 


SAMPLING METHODS FOR CENSUSES AND SURVEYS 


Example 6.21 

The percentage solids-not-fat in two successive months for a l the 16 cows 
in a hXhich were in their 2-6 months of lactation m one, at least, of these 

months were observed to be : 

i 2 3 4 6 6 7 8 

^ k ’ ' 8 82 8-94 9-86 8-90 9-00 9-13 8-90 9-02 

November . 8 82 8J)4 ^ __ g . 08 8 . 66 8 -68 8-86 

December . 

Cow . • 9 10 11 12 13 14 16 16 

: o'S !S 9-38 8,8 9,0 9.04 

November. 

The sampling is here due to natural causes, and the number of cows in 
theiTVe months of lactation will therefore not remain completely consta . 
The above two months were selected from more extensive reco 


We find : 

s (*0 = 

S(x") = 


S (x' — x') 2 


-- 73-53 
= 36-52 
= 9-1912 
= 9-13 
= 9-1708 
= 0-3435 


} S(/) = 72-43 

j x' S(y") = 36-30 

j>' = 9-0538 

3 y" = 9-075 

708 ‘ y== 9-0608 

435 S(x' —x')(y —?) 

: = 0-4076 

r = 0-4076/-\/(0-3435 X 0-6742) 


S(y’ -y'f 


- 0-1374 

- 0-055 
- 0-11 

0-6742 


It will be noted that the value of b is greater than unity, which illustrates 
1 1 no ; n t made above that r provides a better estimate of the reg 

The esLdiofi oft should y be ^ — 

extensive data, though no very high accuracy is required 100 pa r 
STbe full, adequate. In the present case the more eBens.ve data confirm 

that the above value of r is about correct. 


We also have X 


i and thus 


_ 0-724 { 9-0538 + 0-847 (9-1708 - 9-1912)} + 0-276 X 9-075 = 9-0471 
Change - 0-929 (- 0-1374) + 0-071 (- 0-055) = - 0-1315 







estimation of the population values 

The estimate of the mean fot November, revised on the basis of the 

December values, is n-97fi y 9-13 = 9-1786 

v - 0-724 49-1912 + 0-847 (9-0608 - 9-0538)} + 0-276 X 9 

* nan Agreement is here exact, 

giving the check, 9-0471 - 9 ' 178 ^ ~~ anc j ' x on y have both been taken 
L* the two regression coefficients, y on * and * on * 

equal to r. 

, 22 Sampling on a number of successive occasions 

6.22 bampuag u ... w section cover all cases of 

The formulas of estimation gi™ g ^ u is carr i e d out with partial 

sampling on two occasions on y. suc ^ simple general solution is 

replacement on more than twoocc which are ver y similar in form to 

possible, but certain approximate salirt , fficient for m0 st practical 

those for sampling on two occasions, are likely 

purposes. , . , • r „ npate d at intervals it is generally desirable 

In a sampling scheme which is P “ population mean on each 

to provide as accurate an estimat!“previous occasions. Suppose 
occasion without any revision of the ® Stl , P be obtained for occasion h, 

tha, ya is the most accuse ‘ „P » and including .hi, 

taking into account the re f u estimate for occasion h — 1, taking 

occasion h, and that y h - i » T „ occas ion h - 1. Subject to certain 

intt account the results up » and^ludtng oc = n 4 ^ 

+ ]>> + **’ 

brSkets, and double dashes units taring on £ “Led „„ eac h 

The limitations are that a give^n f occasions an d the correlation 

occasion, that the variabi i y nn etant and that the correlation between 

r between successive occasions are^n ^ ^ ^ . g f>> etc . This last 

occasions two apart is r , tha included for more than two occasions, 

normal c— * * 

”*T h b"', depend, on -he value °f >,“ 

(p = — 2A r 2 

f , O have been given in the previous section. For 
The values for h = 2 have oeen g for all occaslons after 

practical purposes the limiting vaue o <f J H> D. Patterson), 

h _ a above formula for <(> is due to ivir. 



° WUNG METHODS *>* CENSUSES AND SURVEYS 

-and formula 6^.^^ ^ determined > *e values of <p can be calculated 
For most practical purposes v - v •„ 
of the change between occasions* A -VanT I ^Tf^t “ adequate esdmate 
interest, however, formula 6 21 h u * ^ change is of particular 

-of course not agree exactly with b ® U *. ed ’. n latter estimate will 

inconsistencies in the summary thfVesufe 6 lead t0 apparent 

a broadly following 

umts on the different occasions ’ TV ^ t° S ?, me lne quality of numbers of 
9 the value <p' gi ve n by ’ hlS Can be aIIowed for by substituting for 


9 ~ fxn h 9 ' (6.22. b) 

not included^o^^^hpre^j^'^^si'o ^ 38 * 011 *’ ^ iS the nUmber of unit ? 

Tabl, 22 Percentage souds-not-eat : adjustment oe sa„ L es tae® 

__ON SUCCESSIVE OCCASIONS 


January I February I Manih ' April May Jmu , 


; n 

j 

5 9-400 

n 9-090 

\y\ . . j 


4 9-288 

y'\ 

! 

— 

7 8-977 

lYhf • 

_ 

~ 

9-341 

Yn 

9-400 

9-122 I 

[ y'h] . 

Yn - [y\] . | 

4 9-335 

8 9-072 

I 

+ -065 

+ -050 

9 / • . 

t 

From 

r 

•603 

differences 

9-400 

9-353 


4 9-288 8 9-122 8 9-086 3 9-060 7 9-326 

’ 8-977 1 9-020 2 8-950 5 9-302 4 9-380 

9 341 9-163 9.188 9-226 0.900 


•050 + -126 + . 2 , 


1_ j | J 9-403 9-4 64 9-577 | 9 . 636 | 







ESTIMATION OF THE POPULATION VALUES SECT. 6 . 22 : 

Example 6.22 

Similar data on percentage solids-not-fat to those of Example 6,21 are 
given in abstract form in Table 6.22 for the months January-June. Only 
the 3-5 months of lactation are included. Obtain estimates of the mean 
percentage in successive months. 


The table shows for each month the overall mean y h> the mean of cows 
occurring in the previous month y h ' f and the mean of new cows y h ' f . The 
numbers of cows on which these means are based are also shown. TThe mean 
for the month h — 1 of cows occurring in months h and h — 1 is shown 
in the line [y h '] in the column for month h — 1 . Thus 9*335 and 9*288 are 
the means for January and February of the four cows occurring in both these 
months. 

Summation of the sums of squares and products of deviations of pairs 
of entries for successive months from January to December gives an overall 
value for r of 0*811, so that r 2 is 0*657. The similar calculation of the 
correlation between months two apart gives r' equal to 0*746. The assumption 
that r f equals r 2 therefore somewhat underweights the information obtainable 
from occasions two apart. 

The average value of ju over a long period will be 1 / 3 , but considerable 
fluctuations in numbers occur from month to month. For ju == 1/3 the value 
of (p for occasions subsequent to the second is 


- (1 - 0*657) + VO “ 0*657){1 - 0*6*57 (1 - 4.2/3.1/3)} 

9 2 x 2/3 x 0-657 ' “ °’ 0 

Hence for March 


<p f = p °‘ 252 = °* 084 


etc. These values are shown in Table 6 . 22 . 

For the second occasion formula 6 .21.a may be used. This is of the same 
form as formula 6 . 22 .a, and gives 


7 (1 - 0*657/5) 

9 “ 4+ 7(1 - 0-657/5) 


= 0-603 


Had the value for pi = 1/3 been calculated, and corrected by means of formula 
6 . 22 .b, we should have obtained 


<p = 1 


2/3 


0*657/9 


: 0*281 


9 


11/3 


0*281 = 0*536 


which does not differ greatly from the correct value. Equally 99 differs little 
from the value 0*252, obtained above for subsequent occasions. 
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SECT. 6.22 SAMPLING METHODS FOR CENSUSES AND SURVEYS 

The remainder of the calculations follow a standard pattern. The quantity 
{y h }, equal to y h ' + r(y h ~ 1 — [ y h ' - i]), is calculated, and the weighted mean 
ofJk" and {y^} taken, with weights equal to qf and 1 — 9/. Thus for 
February 

{y h } = 9*288 + 0*811 X (+ 0*065) = 9*341 
y h = 0*603 X 8*977 + 0*397 X 9*341 = 9*122 

The overall estimates y h and the estimates from differences y h ' — [y h f _ x ] 
are shown for comparison. The differences show a tendency to cumulative 
errors, which is to be expected even with close correlation. 

It will be seen that once a value for r has been determined, and provision 
has been made to abstract the means yy h '\ and [y h f _ x ], the calculations 
are very simple, and can easily be undertaken for large-scale surveys, even 
when a number of different quantities require estimation. 
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CHAPTER 7 


estimation of the sampling error 

7.1 Sampling ermrs of a random sample 

actual error in the estimate of the mean frL** 7% "’ ' ' Then the 

where r is the selected unit. sample of one unit will be z r , 

of the estimates from a large number of ^ 0^7 ^ 3Verage of the mors 

the signs of the errors) wilf approximate to zero /hk ^7^ fegard t0 
there is no bias in the estimate. C ° Zer °' This is equivalent to saying 

—"tSSEJ? 0f .«™ wo 

account of sign. One simple measure wS ^ f Whkh d ° eS take 
of all the *’s without regard to^ign but a^ l^ - ** ^ the ™ge 
number of statistical advantages TvroiiZZ^ whidl has » 

all the *»s. This is termed the mean^ZZZt ^ “ ? ean of the squares of 
and is denoted by V ( y) or o 2 nmt -5 e deviation of y or the variance of y 

root of this variaU sleLd’Te ZT7T ^ V( ^ ° r * The square 
In the same way, T” " ° f . a singIe 

the sampling variance of an unbiased estimate™ ^ - ? nits we ma ) r define 
sample as the mean of the squares of i-fJ ? ^ Y ’ denved fr om such a 

»mples of the samo , ize ° f “ !“*= of 

<»t,mated, by V (y). The rf d,,* !,. *, b f v (f). or. if 

standard error of the estimate and will Z f *1 g ® nera % ter med the 
estimated, by S.E. fyi Th P ’ j , den °ted by S.E. (y), or if 

to the standard deviation of a single linit ^articT T SOmetimes applied 
m the nature of errors of observation ’ P CuIarly when the deviations are 
The standard error of the estimate of thp i • 
a sample of one unit is therefore equal °to p ° pulat “ n mean derived from 

nnit. If a sample of two unitsTLd , i 1 1^, deviation of a single 

of the population mean will be k ’ he 3CtUa error of the estimate 

*Tiv , , y ~f —f — y = i(z r + % s ) 

Of tlfa^i™,e°^ CStimate WiH therefore b ' e gi ™ ^ the square root 

T f pi. . . . a O + zs) 2 = i (z* -l Zs 2 + 2zr . 

population „ tag* the average ta|ue of ^ , s ^ ^ ^ ^ ^ 
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we consider a series of samples having^ ^T'^botTo 8 !' a Hence the average 
units The average values of stand ard error of the 

value of the above expression is |« ■ Consequently 

estimate is <j/V 2 - , , , „ mimpn1 . does not depend on the form of 

It will be noted that the above g example for positive and 

the distribution of the s’s—there is non^ ^ ^ how ’ever, require that 
negative deviations to be equally f J and independe ntly selected. If, 

each unit of the sample shall be ra y u th a deviatlon 

for instance, there were apendency to«*^ a not be zero, 

similar to the first unit the averag sample of n units, for which the 

—- - 


V(y)=-V(y), S.E.(y) 


V M 


(7.1.a) 


We thus have the important general tesuh ‘'Thmr'tdy 

of * neon of . 

^Thfi^ err *e ithmite SZ 

t Stanford £w7. ^oS.dVis ».”ubie« to sapling varitnion Th» 
S.E. (Y) = S.E. (gny) = gn (S.E. (y)} - g°V n \ ’ 

Although S (J) is not itself an estimate iUs often^ c "f nt t0 C ° nSlder ** 
sampling variance or standard error. From the abov 

V(5(y)}=«V(y), S.E.{S(y)}=<V« ( 7Xc 

The standard error of an estmate can byxpressed ^ ^ 
population value of the estimated q ua T ; . which the estimate 

as die percentage standard error is mean, of the total of 

STSS-SSfi* TWs is sometimes termed the of variation. 

Si it h, . i we have, in a hur popn^on 

s.E. % (y) = S.E. % {S (J» = SE - % (Y) “ ( /o),V 

Thus in a population with a percentag population mean 

* p- “»*■• *** frora • ° f 400 ” m 

* InSt Sm 

terms an estimate of the va ^ m erical values of the selected units from 

from die deviado« ^-T of the »u = . vduta o „ ^ „ a 
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SECT. 7.1 


first approximation an estimate of cr 2 will therefore be provided by the mean 
square deviation S(y-y)y n . Actually the sum of the squares of Z 
deviations from the sample mean is always less than the sum of the squares of 
the deviations from the population mean, as can be seen from the identity 

S (y —y) 2 = S(y — y) 2 — n(y — yf 

The average value of the first term on the right-hand side is and the 
average value of the second term is a 2 , since y - y is the error in the estimate 
of the mean. Thus S (y — y ) 2 has an average value of (n — 1 ) <j 2 and 
consequently an estimate of cr 2 is given by 


* 2 s (y-yf (7.1.d) 

The divisor « — 1 is technically known as the number of degrees of freedom 
associated with the estimate of error, and is equal to the number of independent 
comparisons that can be made between n values. 

The calculation of the sum of the squares of the deviations S(y-y) 2 
is best done from the sum of the squares of the values themselves. By this 
procedure the calculation and squaring of the individual deviations, which 
often involve fractional values, is avoided. One of the expressions 

& ( y y) 2 = S ( y 2 ) — ny 2 

= S(y 2 )-yS(y) 

= 5(y 2 )-^{5(y)} 2 

is used. The last term of each of the three expressions is usually termed 
the correction for the mean.” In calculating it from one of the first two 
expressions, y must be taken to at least as many significant figures as ™ 

required in the correction. For this reason the last expression is often the 
most convenient. 

Sometimes it pays to use some convenient round number y 0 as a working 
mean, in which case we have J wonting 


etc ^(T-jT^y-yo^-nty-yo ) 2 

If a calculating machine is available the individual squares should not be 
down-,he sum of squ „ ra can be obtained dtacdy by faring\he 

numbers successively without clearing the machine. 4 ® 

in Z C m plt U “™ It.lT ° f * q "““ f ” m gr0 ' ,p ‘ d "*“ is 

Example 7.1,a 

*^7? St f and l ar< ? e l r0r of the estimate of Ae mean of the population 
(assumed large) of which the values of Table 6.4 are a sample. F P 
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SECT. 7.1 


SAMPLING METHODS FOR CENSUSES AND SURVEYS 


The computations are as follows: 
n — 20 
S (y) = 193*8 


S(y 2 ) = 1959*12 
yS(y) = 1877*92 


y = 9-69000 S(y—yf= 81-20 

S 2 - = 4-274 = 2-07 2 

* _ 19 

S - E - (?) = = ± 0462 


Example 7.1 .b 

Table 7.1 gives the distribution of family income in a sample of 162 white 
families in Norfolk-Portsmouth, Virginia. Calculate the mean income of the 
samole and the sampling standard error of this mean. 


Table 7.1— Annual net income of a sample of 162 white families 
in Norfolk-Portsmouth, Virginia, 1934-6 



No. of 
families 

(2) 

Working 

units 

(3) 

Calculation 

Calculation by 
successive 
summation 

Annual 

net 

income 

$ 

(i) 

Total 

(2) X (3) 

(4) 

Sum of 
squares 
(2) X (3) 2 
= (3) X (4) 

(5) 

Total 

(6) 

Sum of 
squares 

(7) 

600- 

10 

- 3 

-38 

90 

10 

10 

900- 

23 

— 2 

- 46 

92 

33 

43 

1,200- 

40 

- 1 

- 40 . 

40 

73 

116 

1,500- 

32 

0 

0 

0 

116 

- 

1,800- 

28 

+ 1 

+ 28 

28 

57 

105 

2,100- 

20 

+ 2 

+ 40 

80 

29 

48 

2,400- 

4 

+ 3 

+ 12 

36 

9 

19 

2,700- 

2 

+ 4 

+ 8 

32 

5 

10 

3,000- 

1 

+ 5 

+ 5 

25 

3 

5 

3,300- 

2 

+ 6 

+ 12 

72 

2 

2 


162 


- 11 

495 

105 

358 
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w . , " SECT. 7.2 

working Imit o“e^Tthe cTt"!' 8r °“ P " U '™ 1 “ 

zz«l;rz «r t 17 d i ( rs ralue ' 

calculation of the total and sum of squares if ThT*^ — ^ chosen - The 
and 5 of the table. The mean of\hf> \ be units ls shown in columns 4 
-11/162, U - 0-06790, and in the profS UmtS iS therefbre 

1649,6 - 0'06790 x 300 = 1629-1 
"" ° f 8<,uares ° f ,he in the working „„ its is 

495 - 0-06790 x 11 = 494-25 
and in the proper units is therefore 494-25 v snr >2 • .. 

alternative form of”cakuktioj 'm “ j“™"V™/”''' “ * vailable the 

Column 6 is formed from column 2 hv • nc * 7 ma ^ preferred. 

Column 7 is similarly formed from TT* SUI 2 mation from th « ends. 

73 + 57 + 32 = 162 for Jumn 6 and rh ^ N ° te the ^eck 

column 7 from the totals of column 6 Ttf ^ff 8 - ° f the final vaIues for 

given by the difference of the totX of thl 1 f 'TP* UnitS is 
105 - 116 = _ n. The ° f the tw f ° ha lves of column 6 , U. by 

of column 7 and deducting the sum of thTtotL^fd!^ by d ° ublin S the total 
i-e. by 2 x 358 — 105 — 116 = 495 1 1 the two halves of column 6 , 

7.2 Sampling from a finite population 

is nIuat° Ve ifthJ if *he population 

® 8== n~~t s p (y ~ y) 2 

to regarding the popul^fiTas fedf^raldo 16 p0puIati ° n - This is equivalent 
population with variance c 2 With thX d f? sam P le fr °m an infinitely large 
stands without modification.^ °* d^ 7,1 d ^ 

the alternative definition with divisor N X f b bs “ d scientific papers 
factor (N - 1)/N into the fLula for^ 2 and S \ ^ the 

m the discussion of the errors of sampling frn fi dS ° tber com P li cations 
avoided by the use of the rr St°deSon g ^ *** P ° pulati °ns which are 

the mean" oTodfe^est^ roquSs mo+fi S *? nda , rd e . rror ° f ** eStimate of 
la - Vd -/), or more st^V^^ *• 

■ «*«-./+/ M 
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, rT 7 , 2 SAMPLING METHODS FOR CENSUSES AND SURVEYS . 

That the introduction of some /f*-' p° r 
if the whok W^^^dcduced by an extension of the algebraic 

when testing the diiTerence between h tQ different ca usal agencies 

to see whether, for exampethey^ ^ there is a real and consistent 
In this case we are concerned to d™ tw0 populations : in other 

difference running throughaUthe^ can reas0 nably be regarded 

words, we wish to test whether the^ population , 0 r whether 

as random tTbf r S ega°rded twfdffferent parent populations. 

Example 7.2.a estimates obtained in 

Estimate the standard errors applicable to the 

Example 6.4.b. 

^ 17 13 ^ = 4-274, and consequently 

From Example 7.1.a, s -* ’ 

/ f 4-274/ M = ± 0-453 

S.E. (y) = A/ 1 20 \ 25/ J 


and since 


Y = 500 y 

S.E. (Y) = 500 X 0-453 = ± 226 
S.E. (Y') = 507 X 0-453 = ± 230 


^ — fra " ,he r “ dom 
of 125 farms of Table 6.6.a. 

We find: _ 9 oni y = 18-4080 

n = 1-25 164,904 s 2 = 164,904/124 = 1329-9 

cc yi\ = 207,261 y* , o iq 

l li (?) - VU»M o r,^'a -X)f=S X m X 3-18 - ± 

S.E. (V) - “VO 26 x sum rf squ „ es 0 f deviations may 

The calculation of the mean an T p e groups should be so 

tsefSat th C e ^ “^facr 




ESTIMATION OF THE SAMPLING ERROR sECT - 

,a,„ es in units of the grouping interval anda fild te^um “ 

(- 209-t6 + 136)/125 = -73-75/125 = -0 BW 

and in terms of the prop* x 10 = 18 . 60 

"““I (1VW-2 _ ,3-75 X 0-530) X 107124 = 167,570,124 = 1351-4 
The rest of the computations proceed as before. 

Table 7.2 —Calculation of the ““ 

(wheat acreages of Table o.o.a;__ 


If the data are fully. tabulated, grouping is scarcely ^ 
small a body of data, even when a c f cu ^ t1 ^ the grouping while 

-especially when, as here, the existence o ze sauares With material on 

h“he k r It most easily and‘compactly presented 
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^r pe i!r^i/^„t v rr s if "i' 0 ™ of «*«**. 

Grouping also enables’the form of thl^sttibut.'frto'h ,a |, Ue8 “ kr8 '' 
comprehended. In the present data th» 1 i T be much more easd y 
between 20 and 30 and the single relatively large number of values 

apparent. ' ® ver y igh value of 265, are immediately 

7.3 The normal law of error 

Of I hC set°.3 SSltS TJ, iS , P °rf ble ' ‘T ,he ™l"» 

of the mean of the population’ This ^ 6 C standard error of the estimate 
to be expected Tf P f' M ' u Thls § lves us a measure of the average error 

the^uency wtth £ 

types of material but it is a fnt-t' + urse > vary considerably in different 

of^distributionsof the pamnt mSrTa/ t^T ™ ^ 

the mean, total etc are snhieet- a- .® rr01 " s t0 w hich estimates such as 
what is known as he normal Taw T dlStnbuted approximately according to 


/(*) dz 


-e-*IWdz 


aV(27t) ' 

where a is the standard deviation, and e is the base nf mw • , 

-2-71828 approximately 6 base of Napierian logarithms, 

i-.^^assts-s 

0, dz — o-l and multiplying by 1000 ^ P g lj 

t a x ive „ tst* d r ia,io ” s ° f a 
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ESTIMATION OF THE SAMPLING ERROR 


SECT. 7 4 


Table 7.3-Observed and expected frequencies in the sample 

_ ° F TABLE 6 - 4 FR °M A NORMAL DISTRIBUTION 

Observed *0 V 2-5 ^ H 7 12 12-1 _ 3 13-14 14 ~ 15 > 15 Total 

Expected 0*45 0-88 


2*5 5 2*5 

1-84 3*00 3*83 


5*5 

3*83 


1 

3*00 


0*5 

1*84 


1 

0*88 


1 

0*33 


0 

0*12 


20 

20*00 


- v ^ zu*uo 

is™ sm / ro P° rtionat f frequency of observations greater than + i. 0o 
IS U*dl73, and consequently the • ® Acatcl man dc a 


S.E. (s) 


± 0-324 


’ V(2 X 19) 

r e ,h“'“ tr ore *- - -—- ■*«. 


*- £ * Qualitative variates 

T pte u be ">« 

variate can be divided into 1 1 H' x ° t CS imate ® derived from a quantitative 
of the variability of the individual samJ^ 8 ^’ being the esti mation 

of which e ,h 7r 

derivation of the standard prmro +u t - P g error;, and the second the 
Of the individual S ^ ” ,KS “ ° f 

qJSvTfn “ij 0 h ” “" d » consideration i» 

attribute of the samZa u„rde^ T P *1 h ° Wever - ,he "““r »f an 
Possessing the attribute fn the population 0 " H 7 |lr0 ‘ K ’ m " rl of units 
estimate of the variability of ^individual “ Fandom sam P les no 

For a redout setup,/fl*^ £?i? ‘ ^Le 

/pq 

n 


V(P) = 


pq 

« : 


S.E. (p) = 


fc,c,ion/is ,ppreciabie *• “ 


S -E. (p) = y'{ pq (i __ yy w i 

iS 0b ““ d by rep,ad “* (1 * flf-MK - I). 


and hence 


V(«) =«pq(l -/) 

S.E. (U) =g^{n pq (1 -/)} = N{ S.E. (p)} 
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SECT. 7.4 SAMPLING METHODS FOR CENSUSES AND SURVEYS 

Fig 7.4 shows the way in which the standard error of 
percentage S.E. (100 p) varies with the percentage 100 pjm 

by^r^Tme S L dotted line gives the percentage standard error 

100 S.E. (100 p)/100p. 



o, 1U » 20% 30% 40% 50% 60% 70% 80% »/, 

PERCENTAGE IN POPULATION , 100 p 

broken line the percentage standard err ® ? , percentage. The scales shown are 

te*!£5£"oM e « 5K “X7 <££.?"io.ooS «*>■ «»• «■« «-»•»“ 

scale by 10, etc. 

The standard errors obtained with larger samples for which the sample 
-S STpowe, of 10 cm also be read 

the scales by the appropriate power of 10. Thus for »““gL . ^ 10 . 
scale for the sample of 100 is divided by 10, since ^( 0,000/100) 10 

The actual standard error has its maximum value aP -05* A 
nnint the standard error of the estimated percentage with a sample ot 1U 
rfo.td^s^p.. of 1000 i, 1-58. «. if .he We pwWage » 50 »» 
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usually lie between 40 per cent 

cent, with a sample of 1000 Expressed in C WCen 7 PC1 Cent ' and 53 P er 
errors and limits at this pote the 8tandard 

in the population decreases from 50 per cent thT T T*" A . S the Percentage 
estimated percentage also decreases hut th* ’ * tUa standard erro r of the 
to increase. With 100 p = 20 per cent the P ^ Ce " ta S e ® ta n d ard error continues 
of 100 is 4-0 and the P ercentageSan dar d frrorln With a 

cent, they are 2-2 and^p T^ZsZctZI rF' "£ 5 ^ 100 P =* « £r 
serves to verify that the proportion fn a populari^’ WhlIe . qmte a sma11 sa mple 
is small, the determination with any accurL^ of the aTT® 3 3ttribute 
the attribute requires a relatively large sample whe^ the nUm ! >er P ossess mg 
In estimating the samnlino- +u ^ . n P ro P°rtion is small. 

be replaced by its estimate p from the' 'sSFi? h“ tZ P ° PUhti ° a p ca “ 
sample. This results in a certain amount ’ ■ by th ® P r °portion in the 
since the proportion in the sample will not m Fu estimate of variability, 
in tile population, bu, in W P ^'L"°!''l g ““ al be «q™I to that 

census work, this is not likely to li of * “ “ c co ' nn “ ri| y ™et with in 
of the problem is posleby Jse of T,h, vm^T o' E ““ 

Biological, Agricultural and Medical Research ° ° f St ' iUmc “ l ™‘s for 

nniW oTl^hlTt7Slo^ “Su,“ Z* ^ <* 

Stratum taken separately. In other cas^ fc® P ° rmulae a PPty to eac b 

types of sampling with supplementarv infhr ^ uItI ", sta g e sampling, and all 
depends onlf on the ** V3riability no ^ger 

example, the formula are not applicable to^hl ^ ° f Strata ’ Thus > for 
a given crop when two-stage samnlinl Z , P r ° P0rtl0n of f ™ growing 
farms within selected districts has^ee^ Z ad ™ nistrat rf e districts, and by 

applicable to the proportion of individuals of agiven race in a hum ^ T 
when the sampling has been by households ^sinct X 3 h . Uman Population, 
is usually of the same race the variaKil.V ’ n ? tb ^ hole of a household 

rand - Sample of individuals had been takeT ^ th3n if a 

sampling iLtturt « the in “ 

variate, scoring the qualitative variate 1 or 0. (See ExanTple 7.S.bT^ 
Example 7.4 

Estimate the sampling erects of the estimates of Eiample 6 4a 
We haye p = 0-0738, q = 1 - 0-0738 = 0-9262. Hen« 

S.E. ( P ) =, / r 0 -°738 X 0-9262 / l \ 

VL 8491 A ~5oJJ == ± 0 - 00 2 8 l 

sf±T nt T def f tive) = 100 X 0-00281 = ± 0-281 

S-E. (total number defectiye) = S.E. (U) _ 50 x 8491 x 0-00281 _ ± , 1M 
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Thus the percentage defective, 7‘38 per cent., has a standard 
± S 8 , which implies taking limits of plus or minus twice the standard^rmr 
that the true percentage defective probably lies between 6-8 per cent. and 
i S per cent Similarly the number defective probably lies between 29,000 
and *33 700. Note that the standard error expressed as a percentage of t e 
^ceSage defective or number defective, what is ordinarily called the 

percentage standard error, is 


0-281 , 1190 

S.E. % (p) = S.E. % (U) = -y^g x 100 31,350 


X 100 = ± 3-8 per cent. 


These standard errors are likely to be slight overestimates, since the sample 
was in fact systematic. 

7.5 Standard errors of functions of estimates 

If we have a number of estimates y u y 2 , y s , • • • with samplmg errors 

which are independent, the sampling variances being V (ft), V ( y 2 ), ( y 3 ), 

and we form a linear function of the y s: 

L = 4yi + hYs + hYa+ • • * 

where the Z’s are any multipliers whose values are not influenced by the 
sampling, the sampling variance of L is given by 

- V(L) = Z 1 2 V(y 1 ) + 4 2 V(y 2 ) + 4 a V(y 3 )+ - • • ( 7 - 5 a ) 

The condition of independence is important. The sampling ^rs of tw 
estimates will be independent if the estimates are deHyedfroms^of^cs 
which are themselves independent. Estimates derived f^ amples ot 
different populations, or from different strata of the same population, are 
consequently independent, as are estimates derived from different samp es o 
a large population. Estimates derived from two different variates belonging 
to th! sLe sampling units are not in general independent, since such vanates 
are likely to be correlated, high values of the one being associated with high 
(or low) values of the other in the same sampling units. 

A number of important simple formula are derivable from the abo 

gen The standard error of a multiple of an estimate is the same multiple of the 
standard error of the estimate: 

V(/y 1 )=Z 2 V(y 1 ) ( 7 - 5b ) 

S.E. (fft) = l S.E. (yO 

T his formula has already been used in Section 7.1. 

The standard error of the difference of two independent estimates is the 
square root of the sum of the squares of the standard errors of the estimates . 

V(y 1 -y 2 )=V(y 1 ) + V(y 2 ) ( 7 - 5c ) 

S.E. ( yi - y 2 ) = V[{S-E. (yi )} 2 + {S.E. (y 2 )} 2 ] 





estimation of the sampling erkor sect. 7.5 

The standard error of the sum of a number of independent estimates is the 
square root of the sum of the squares of the standard errors of the estimates : 

V ( Xl + y 2 + y 3 + ...) = v (yj + V (y 2 ) + V (y,) + ... (7.5. d) 

S f‘ (yi + y 2 + y 3 + • •.) == VT(S.E. fo )} 2 + (S.E. (y 2 )} 2 -f (S.E. (y 3 )} 2 + .. .] 
which may be expressed by the rule that “variances are additive.” 

A he standard error of the estimate of the mean of a large population can 
also be derived from the formula. 

Weighted means are a type of linear function which occurs frequently in 
statistics. The general form of a weighted mean is ■ 




■ Wl Wi Y* + 

W 1 + w 2 + 


where the w’s are the weights. Knowing the variances of the y’s, the variance 
o y w can be calculated, provided the y’s are independent. Two cases are of 
frequent occurrence. 


(1) V(y 1 )=V(y 2 ) 

We then have 


( 2 ) V (Xi) = A/tt*!, etc,, 
We then have 


= V(y) 


V(y w ) 

where A is a constant. 


S(w 2 ) 

{S (w)}* y W 


V (yw) = 


S(to) 




(7.5.e> 


(7.5. f> 


{S(*0} 2 S(w) 

This is the form of weighted mean which is used when we wish to obtain 
the most accurate combined estimate from a number of independent estimates 
of the same quantity whose relative variances are known. The weights are 
taken equal (or proportional) to the reciprocals of the variances, and the variance 
of the weighted mean is given by the reciprocal of the sum of the weights, 
(or a multiple of this reciprocal). 

A further type of weighted mean is that in which the weights w are in the 
nature of supplementary information, the quantities y and w both being 
determined from the individual sampling units, with the variances of the y’s 
related m some unknown manner to the w’s, and y w = S(wy)IS(w) 

In order to obtain an unbiased estimate of V(y w ), whatever the variance 
law, the squares of the deviations of they fromy* must be weighted in proportion 
to w before summation. For a random sample, if 

Q = Sw 2 (y —y w y 


and 


we have 


V 


V(y„) 


■QI(n- 1 ) 

= C 1 ~/) nsf 
{S (w)Y 


(7.5.i 
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It may also be noted that if the variance of y for given w can be regarded 
as constant over the range of w, and there is also no variation in the mean 
value of y for given w over the range other than that ascribable to random 
variation in y, the efficient estimate of the variance of y is given by the ordinary 
formula 

V(y) - S(y — y) 2 l(n - 1) (7.5.h) 

and formula 7.5. e may be used to estimate Y (yw), with the introduction of 
the factor (1 — /). If the variance of y for given w is inversely proportional 
to w then the efficient estimate of the variance of y is given by 


Q' = Sw (y — y w ) 2 = Swy 2 — yw Swy 

V 2 = Q'I(n - 1) 


£q' 2 is an estimate of A, and formula 7.5.f can be used for estimating V(yw)> 
with the introduction of the factor (1 —/). Either of these estimates will 
be biased if the true variance law is different from that assumed or the other 
condition does not hold. They should therefore not be used without careful 
consideration. 

The mean ratio r used in the ratio method of estimation is an example 
of a weighted mean of the above type, since r = S ( y)/S (a?) = S (xr)/S (x)> 
and we therefore substitute r for y and x for w. This case is discussed in more 
detail in Sections 7.8—7.11, which deal with the estimation of errors in the 
ratio method in both random and stratified samples. Normally formula 7.5.g 
will be used to estimate Y(r), but under certain circumstances formulae 7.5.h 
and 7.5.e may be employed. 

The approximate formulae for the standard errors of the product and the 
ratio of two estimates whose sampling errors are independent may also be 
noted. These are given by 


V(y 1 y 2 )-y 2 2 V(y 1 ) + y 1 2 V(y 2 ) 

,,s 


yi 


y-i 


) 


(7.5.j) 

(7.5. k) 


These formulas are only satisfactory if V (yj) and V (y 2 ) are small relative to 
y x 2 and y 2 2 respectively. 

If the estimates y v y 2 , y 3 , . . . are not independent the concept of covariance 
must be introduced. The covariance between two estimates is the mean 
product deviation, and is estimated in exactly the same manner as is the 
variance of each of the estimates, with the exception that the sum of squares 
of the deviations of a single variate is replaced by the sum of products of the 
deviations of the two variates. If the covariance between y, and y 2 is denoted 
by cov (y x y 2 ) the additional terms 

+ 244 cov (yog + 244 cov (YiYa) + 2 44<*>v (y 2 y 3 ) + ■ • • 
must be introduced into the formula for V (L). This gives the additional 
term - 2 cov foy 2 ) in the formula for V (y, - y 2 ). The corresponding 
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additional term in Y (y^) is + 2y x y 2 cov (y^), and that in V (yjy^) is 
— 2 cov (yiy 2 )/yiy2 within the bracket. 

If y lt y 2i y 3} ... are derived from different variates belonging to the same 
sampling units, e.g. measurements of different characters, the variance of any 
linear function L can, if desired, be estimated directly by calculating a value 
L for each sampling unit separately and estimating V (L) from these values 
in the manner appropriate to a single variate. This obviates the calculation 
of the variances and covariances of the individual variates. The same method 
can be followed with products and ratios, subject to the same limitations as 
those given above for formulae 7.5.e and 7.5.f. If the errors of a number 
of functions are required, however, it is best to calculate the variances and 
covariances (Example 7.8.b). 

The regression and correlation coefficients can be expressed in terms of 
the variances and covariance. We have the relations b = cov (xy)/V (#) ? 
and r = cov (xy)jV{V ( x ) • V (?)}. 

In the more complicated types of sampling, discussed later, the estimation 
of covariance is again exactly parallel to the estimation of the corresponding 
variances, the squares being replaced by products wherever they occur. 

Example 7,5 

Calculate standard errors for the various estimates of the regional and 
varietal differences between the yields of potatoes given in Tables 5.23.C, 
5.23.d, 5.23.e and 5.24, given that the variance of the yield per acre of any 
one variety in any one region is 4*22, and that the standard deviation is 
therefore ± 2*05. 

The standard errors of the regional-varietal means of Table 5.23.b are 
obtained by dividing the above standard deviation by the square roots of the 
numbers of fields. Thus for Majestic in Scotland the standard error is 
2-05/V3? = ±0*34. The standard errors are shown in Table 7.5. 


Table 7.5— Potato survey : standard errors of regional-varietal means 



Scotland 

North 

E. Midlands 

South 

West 

Majestic . 

± 0*34 

d= 0*24 

± 0*20 

± 0*20 

± 0-24 

King Edward . 

± 0*32 

± 0-55 

± 0*22 

± 0-25 

± 0*31 

Great Scot 

i 

± 0*48 

±0-55 

— 

± 0-84 

4- 0*48 

! 

Arran Banner . 

± 0*72 

± 0-33 

— 

± 0-68 

± 0*38 

Kerr’s Pink 

± 0*25 

± 0-34 

— 

— 

± 0*57 
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These standard errors enable the differences between the individual means 
to be examined more critically. The difference between Scotland and the 
Northern region for Arran Banner, for example, is at first sight anomalous, 
being — 012 . The standard error of this difference is \/(0-72 2 + 0*33 2 ) 
— i 0*79. This difference, therefore, does not conflict very seriously with 
the other differences. 

On the other hand the difference between this difference and the largest 
positive difference, that for King Edward, is + 1-94 — (— 0 * 12 ) = -f- 2*06. 
This quantity has a standard error of i/(0*32 2 + 0 - 55 2 + 0-72 2 -f 0 - 33 2 ) 
7 ± 1,()2 / lt mi S ht therefore be judged unlikely, on this evidence alone, 
that the difference has arisen by chance, since Table A .2 shows that a difference 
of 2-0 times its standard error would arise by chance in less than 1 in 20 times. 
In statistical terminology the difference is significant at the 1 in 20 level of 
significance. This conclusion, however, is subject to the qualification that 
we have here picked the two extreme differences out of 10 possible pairs. A 
combined test* of all 5 differences shows that they are not exceptionally variable. 
A more comprehensive test of the differences of the whole table confirms 
this verdict. It may be noted, however, that Arran Banner is only i as common 
in Scotland as in the Northern region, whereas King Edward is 3 times as 
common in Scotland as in the Northern region. The observed differences 
are therefore in the direction that would be expected if the varieties were 
grown in the regions to which they were most suited. 

The standard errors of the means of Table 5.23. c may be calculated in 
a similar manner, that for mean (a) of Scotland for example being 
W (0-34 2 + 0-32 2 + 0-48 2 _+ 0-72 2 + 0-25 2 ) = ± 0 - 20 . The corresponding 
vahie for the Northern region is ±0-19. The standard error of the estimated 
difference 8*55 — 7*46 — + 1*09 is therefore \/(0-20 2 + 0-19 2 ) = -{- 0*28. 
Similarly the standard error for the corresponding difference of the means (b) 
8*98 — 7*37 = + 1*61, is ± 0*38. V ' 

Table 5.23.d provides an example of a weighted mean with the weights 
so chosen that the most accurate combined estimate is obtained. Pormula 7 . 5 . f 
is therefore appropriate, and X represents the variance of a single field, i.e. 
X 4*22. Hence the variance of the weighted mean difference = 4 * 22/74 
= 0-0570, and the standard error is therefore ± 0-24. 

The relative efficiency of the above estimates of the differences may be 
assessed from the ratio of the reciprocals of the variances (Section 8 . 1 ). 
Assigning a value of 100 to the weighted mean, the relative efficiencies of 
means (a) and (b) are 73-5 and 39-9.f 

Finally we may evaluate the standard errors of Table 5.23.e. These 
cannot be evaluated exactly, as the pooling of regions is based on the assumption 
that there are no differences of any importance between these regions. In so 


* The weighted sum of squares of deviations gives x 2 
freedom. ° 


5*74 with 4 degrees of 


thf Sf* represent the true efficiencies, which are obtained by assigning 

the value of 100 to the most efficient possible method, here that of Table 5.24. 8 


ESTIMATION OF THE SAMPLING ERROR SECX . 7 j- 

era” !kl“d « S'itlitriThTth”" b ' introdu “ d - “ d »>M 

be underestimates of the true errors. & ^ ^ ^ dlfferences wiU therefore 

froJ'lhe'SSb^S’fiddf th u P °° led rCS '° nS C8n be “Iculated 
Majestic is ^(4-22/356) f “olf ’Se »1"T ? b ““‘- Thra th “ for 

«» have already been g£„ to TaHe , 5 T”d °d ^ S “ ,tish 

weighted mean for Majestic is th^fo “ gi Je„ by " r< ’ r of 

X 0-11 2 -f l 2 X 0-34 2 \ 


vr 


5 2 


») 


! ± 0-11 


± 0-13, y ± 0-29? ± a 0^24 er and ±0.21 ^ V3rieties are found t0 b e 

be obtained in Table 5.24 can only 

equations giving the least-squares solmfon^Thi^ the . Slmultaneous linear 
arithmetical work. The Jl7 T-‘ , re< l ulres a good deal of 

Methods for Research Workers, Section^ f ° r eXam P Ie> in Statistical 

the ^tandlr?' be any need t0 det ™ 
particular differ™ canbeob^ Tu* t0 the Standard ™ of any 

estimate given by Method (3) of Sertion 5^fo* 116 ^ 

by calculating what the ^ obtl,iMd 

classification, and if the relevant ™ri,nc . bC “ th were no cross 

The value of this latter standard error forthe 3 ^ withi ^ sub-classes, 

the Northern region, forexample" is ^ ****** betWee “ Sc ° tland and 


2-05 


1 1 \ 

174 + 177/ = ±°‘ 22 


ro,tT n f s “ - 

7.6 Stratified random sample with possibly unequal variances within 

T"" “” iK “ »* 

population is 10, and 50 oer cent of th, III V \ 1 ’ . T he mean of the 
square deviation or total variance is therefore 0-5 x O+H^p™ 
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sr£ s^rsissLi.irr-.M'A J - = 

square deviation or variance within the first stratum is therefore 1. 

“£ - *e sa„e *. »d ti« w™ 

c,Lk“d »d in . large population thi, will always be to tirao the total 
variance if there are differences between the strata means. 

If the sample numbers in all strata are sufficiently large for the within-strata 
variances sampling unit to be separately estimated, the sampling variances 
If the means or totals of the individual strata can be estimated separately, and 
the sampling variance of the population mean or total, which is a linear function 
Jtoln or to.*, in be obtMned by the nee of .he formul, of 

S 'ThL ™*od is rfd even if there is inequality in the 

variance per sampling unit from stratum to stratum, and is applicable to all 

types of stratification, including stratification with a variable sampling fraction 

stratification after selection. . . 

In general it is best to build up the variance of the population <*tunate 
under consideration by calculating the variances of the componentpaitsa 
adding these variances, or the correct multiples of them, the same steps b g 
fblSfed as ffithe calculation of the estimate itself. We will therefore_no 
stive formulae for the variances of all the different estimates set out in Sections 
6.4 and 6.5, but will illustrate the derivation of such formulae by obtaming 
that for V (v) in the case of a variable sampling fraction. • 

We have y = 2{gi Si (y)}/N. If <u 2 is the variance within the ith stratum, 
y{Si (y)}=n i ciH 1 -ft), and hence by formula 7.5.a 

V (y) = S{£< 2 m oi 2 (1 - fi) }/N 2 ( 7 • 6 • a ) 

For V (y) the <n 2 will be replaced by their estimates s; 2 . 

If all the sampling fractions are equal we have, since N — gn, 

V (y) = (1 -/) S (»i srW ( 7 - 6 - b ) 

If we require the sampling errors of estimates applicable todomainsof 
study which cut across the strata the situation is more complicated. If a dash 
isusedto denote the domain in question, and if «'• is the estimated variance 
per unit between the units of this domain in stratum z ^dtht propo^on 
of the selected units of stratum i not m the domain is q t , so that q, - («. w ‘ l/»‘> 
estimates of the variances of the total and mean of the domain are given by 
V (Y') = s gi 2 m’ fi — ft) {qi yi* + s <' 2 } 

N' 2 V (y') = (1 “/') & <* 2 > 

Consequently, in the case of a stratified sample with uniform sampling 
fraction, an approximate estimate of the sampling error of a domain mean 
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(See U £^p," d , n “) S “’' i ' i ' d *' «'»»*> 

In the case of a • tfle domain. 

otafT **7 I * mnst be 

£'^r = *Ms ‘•r-S’iss 

»dv» Mae SS,kJ stratification, , v^ a “ k “ y b '“ 1 »*™Wy inc rased g 
'“ill be dependent on P S r ° P ™ K ClreUmst “ ces ' The opdmeT'' 0 ” i’* 11 stiU be 

—i. 

Example 7.6. a Practice. 

Estimate the sampling errors of th 

T , g of the estimates of Example fi * 

the computations fr„- pJe °-5. 

” 10 Tab,e '• - vations 

TAB LE a ‘, C “eT2, dmdi ^ bf * ~ ’■ 

- 1 - TT " ™' ™” “«««T 


1- 


v(?;-) 

S -E(y,) 

•069 

* —- 

± 0*26 

•593 

i 0*77 

5*92 

i 2*43 

34*75 

■4- n.Qn ‘ 


142*3 i H .93 


aj . -*--i- _f 

tne variances of the Qt r£ »Eo » ' —!_ 

^sstthiT^-rnS? ilm* 
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7 6 sampling methods for censuses and survey 

SE f the copulation estimates we must bear 

To obtain the sampling errors of ** at ^ P " ere arriv ed at. The estimate Y 
in mind the method by ) which^these m * the sample toto\ bjrto 

SlfTequal to «»« 2? by «d -/)• T ]« 

"£ - -— by *• 

ThOS S.E. CO - 20 X V «. W3 “ 20 X 206 ' 41 “ ± 4110 

S “ My S.E. m = ®5.«/125 - ± of ^ numte 0 f 

In the a„e of V each V (?,) * J Thus 

farms in the size-group, given ^ ^ w , m <m 

Taking the equate = ± «6» 

and hence _ 4 i 6 o/2496 = ± I -67 , v 

\1 h Y' is slightly more accurate than 

second place, use of the «»ctsajnp ^ ^ differen t strata, which may resu 

contributions to the error standard error. 

in raising or lowering the estimate of t - ng whea t follow the same 

The computations for num , er . ar i ance of each size-group total («i> 

rSa^r^u of farms growing 
Ta.lh ^”oM°E B “i h.5 

Size-group 
(acres) 



1-5 

6-20 

21-50 

51-150 

151-300 

301- 


18 

26 

20 

13 


21 

16 

11 


•27778 

•80769 

•8Q000 

•84615 


3-431 

3-837 

3*040 

1-608 
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^UMATION OP tu, „ 

Jie calculations for S E S “ W -"' 0 “™> SICr , . 

«»“P. for example, ^ (U) " re *™> » Table 7.6.b. For , 

V (u ) — 7 o C ar £ es t size- 

W e then have * X °’ 84615 X °' ]5: { 85 X 19/20 = j. 608 

If the sampling errors^ f u ^ * V ' 12 ' 83 = ± 71 *64 

Example 7.6.b ^ 

Estimate the samnl' 

sample afS^^ < ° bt ^ ™™ber 

7 « T ^ e ^ om P ut ations follow i t, 

he val Ues obtained are ^ 38 those for S.E. (Y') in Ex fe 
S.F /V'\ , . -i 


S.E. (Y') : 
S.E. (U') : 


" i 4320 acres 
i 75-2 farms 


The value for s E fY') • ^ 75-2 far ms 

* foe similar eefimate 

», it sri 

not a reflection of foe relativp 6 ^” 068 3re mere ty errors^ because of the one 

sample, and this partied r J r0p0rtionate numbtlfT" 3 " 0 " 8 ’ ha PPen by 

.S?^£- - “ S « »Xve S” h t 

7-7 Pooled ecMmate of en ™ • 

In a stratified samole -m, ' Ma, f*is of va,i anc ,. 

T .. . s - E -m~,,JL=i 

. is the same as e * n 

'" repla “ d by ” ,he *» 
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sect. 1.1 s “ ,PUNO M “ H0M ^luined if the estimate- of «* 

The «* .«o*f«o 5 

f™» *»' “ baald. This is ^yZ sum o' «» 

freedom on which tney a means an d divining y 

SqUare t d f SSrf freedom, i* by * -£ the aboV e sum of squares 

This - 

t0 the sum of squares^ _^ + ^ = 5(y -tf 

This is easily verified if we ^cognize^ ( )} _ j S {y) 

S m {Si-S) = S {j ‘ f the ro ducts of the means 

The first term on the riband mea^ 

an d totals of the sep^te*t , . g the correction Jo** i ^ form 

Of What is known as the analysts J 

Table T.7.a. 


le 7.7 .a. within strata 

7 7 a _ Analysis of variance betw Mean 

TABLE »*** a Sum of square 

Degrees ot squares 

freedom _ A 

x xy.sdy)-y s{y) 

Between strata • . ' B 


Within strata 
Whole sample ♦ 


Whole sample • when the numbers 

The most convenient f»™ the sums of 

ssstir Tl — beM ““ 

of the whole popn^ g ecbon 8.10. . strata contains only 

Stta A fuAher simplifioationjs of squares^ofSs^d^ 

two sampling uni • differe nces of the y s of P f uares will 

“* 

be 1 S (d 2 ), and consequently, 
one 2 degree of freedom, - > 


s s, (y - y*)' 
S(y*) -yS(y) 


C = S* 


. *.* = t S «‘. on , y in sampling but 

tk. -*>* *r*r& ss * way of de, " ram,ng 




thf . w ir,c sampling error 

ifferent components of vo • sect. 7.7 

v.riic,t.tS, a ^ ”»r »= «*%*»«, 

" *" "■' *-» „ equal „ have from . " 

nave, from equation 7.6.b, 

V (y) — j_~~/ S (m Si 2 ) 

^ ■» »<• r„ 

xr sr^'4i^r v^* 

in the estimarion J l CStimates - Use of the 23 Y ^ WlU be Uttle 

0 ^b^c P ett°n **?**'^ 

misleading anef 5“? °J the WWon, ,. g . a”" "PPMe to tte 

b«, wb J ZTflT m,y be ™r 
sm ?C« i “ p 5 c "~”Sm^ «»■ «STj 1' 

vSa d „” : „;x mS? E ™CL e s 

Ztejz “Sr'• ^ 

advisable. USe a P ooIe d estimate, weighting . tr fj. vanarice s 

mup .. . ’ gm mg by »,• if this apnear* 

**■ be „L. h,s « ■*«• ** p^V^LTZS 

Example 7.7, a 

“• h ~ z a % 
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_ methods fob censuses and surveys 

rr 17 SAMPLING METHODS rKj 

SECT. 7.7 SA arranged by districts the 

Since the systematic well » ff ^ 

sample will are too 

numbers m «* «'¥'“ b „, for the larger sme-groops *h.e m“> p 

therefore appmpr,a <dy u« P f()r , he hugest srae-gronp are 

The district totals anu 
Table 7.7.b. 




District: 
^o. of farms 
Total . 

Mean . 


1 

1 

in 

in 


2 

4 

487 


3 

2 

315 


4 5 

9 0 

1937 — 


121-75 157*5 215-2222 


6 

1 

72 

72 


7 

0 


All 

17 

2925 

172-0588 


114 m-'o “ 

in • . rr 1 , 1 s. 7 7 c The sum of squares 

The analysis of '““f £ gT«“ xV15 + •. ••- 

x 172-0588 --- 72, 'f T ' Wr( _ Q f freedom the mean squares. 

and division by the d g , N o WITHIN DISTRICTS OF THU 

Table ,.,-c-Analtsii, of »-““^ (d Tta of Table 6.7.,) 

«®1 *““ GES °' ““ 1 a, Sum Oi ““ 

squares b H 


Degrees of 
freedom 


Between districts . 
Within districts 
Whole size-group 


4 

12 

16 



72,067 


10,174 

2,614 

4,504 


Whole size-group • the overa ll mean 

square, stratification. analysed in the same manner, 

gain m accura y Y ^ 301- can be ana y ^ little 

f “ d 21 - * he 




ESTIMATION OF THE SAMPLING ERROR 


SECT. 7.7 


Table 7.7. d—E stimation of sampling errors of the estimates 

of Example 6.7 


Size-group 

(acres) 

»« 

s* 2 ' 

V{S,(y )} 

gi 

V(Y<) 

1-5 , 

0 





6-20 

3 

0 

0 

40,000 

0 

21-50 

6 

53*5 

320 

3,600 

1,152,000 

51-150 

26 

159*2 

3,930 

400 

1,572,000 

151-300 

40 

564*2 

, 20,310 

100 

2,031,000 

301-500 

! 43 

1,703 

58,580 

25 

1,464,000 

501- 

17 

2,614 

29,630 

9 

267,000 


135 




6,486,000 


variances added, in accordance with formula 7.5.a. The estimated standard 
error of the total acreage is thus 486,000 = ± 2550. 

It will be noted that the acreage of wheat in the smallest size-group has 
been assumed to be zero, and that the estimated zero error variance of the 
second size-group is based on only two degrees of freedom, and is therefore 
very inaccurately determined. It is clear, however, from the nature of the 
material and the trend of the variances in the larger size-groups that this variance 
must be small. 

In the computation of the standard error of the number of farms growing 
wheat, allowance should also strictly be made for the stratification by districts. 
If the number of farms in each size-group district sub-class were large this 
could be done by calculating the variance of each size-group total of farms 
growing wheat by the method of Example 7.6. a. The numbers in many of 
the sub-classes are so small, however, that the approximation resulting from 
using the estimated proportions p to calculate the variances will be unsatisfactory. 
In this case it will be sufficient to ignore the district stratification, calculating 
the variance of each size-group total of farms growing wheat from the proportion 
in that size-group, and then proceeding in the same manner as for wheat 
acreage. The resultant standard error will be found to be ± 88*9. 

Example 7.7.b 

Estimate the sampling standard errors of the regional and varietal means 
of the yields of potatoes given in Table 5.23.a, and compare them with the 
standard errors already obtained in Example 7.5. 

As mentioned in Section 5.23, the sample can be regarded as stratified 
by regions (but not by varieties). The regional standard errors are therefore 
derived from the analysis of variance within and between regions. This is 
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SECT. 7.7 


SAMPLING METHODS FOR CENSUSES AND SURVEYS 


given by lines (1), (4) and (5) of Table 7.7.e. The required standard errors 
are therefore 25/174) = i 0*17, etc. 


Table 7.7. e— Potato survey : analysis of variance of yields per acre 



Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Between regions (1) 

4 

173-7 

43*42 

fBetween varieties (2) 

16 

987*7 

61-73 

Within ! Within varieties (3) 

880 

3713-6 

4-22 

regions ] 

— 



ITotal (4) ... 

896 

4701*3 

5*25 

Total (5). 

900 

4875-0 

5-42 

Between varieties (6) 

4 

887-3 

221-82 

f Between regions (7) 

16 

! 274*1 

17-13 

Within J Within regions (8) 

880 

3713-6 

4*22 

varieties | 

— 



ITotal (9) 

896 

3987-7 

4-45 

Total (10). 

900 

4875*0 

5-42 


The mean square within regions, 5*25, is 1*24 times the mean square 
within regions and varieties, 4*22, already given in Example 7.5. This latter 
mean square is obtained from an analysis of variance within and between 
the regional-varietal groups, lines (2) and (3). This would have been the 
appropriate mean square for estimating the errors of the regional means if 
the sample had been stratified by regions and varieties. 

The exact standard errors of the varietal means cannot be obtained by any 
simple process. If the sample were fully random, and not stratified by regions, 
the correct estimate would be that given by the within-varieties component 
of variance, i.e, by treating the sample as if it were stratified by varieties. 
Stratification by regions will reduce the sampling error of the varietal means 
slightly, but not to any great extent. Consequently the estimate obtained by 
stratifying by varieties and not^by regions will be somewhat of an overestimate 
of the true standard error. 

The analysis of variance within and between varieties (ignoring regions) 
is given by lines (6), (9) and (10) of Table 7.7.e. Approximations to the 
varietal standard errors are therefore given by ^(4*45/393) = d: 0*11, etc. 

It should be noted that although the sample is stratified by regions the 
component of variance due to regions must not be eliminated when calculating 
^ the standard errors of the varietal means. This must only be done if the sample 
is stratified by both varieties and regions. The reason is as follows. If the 
sample is stratified by both varieties and regions the proportions of fields from 
each region in each varietal mean will be exactly equal to the proportions for 
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estimation of the sampling error 


SECT. 7.7 


• u irlt , v Hence only variation between fields of the same 
that variety in the country. .. tbe erro r. In the present case, 

variety within each region contri ; on in each varietal mean do 

however, the proportions of fields fro the g countr y. The deviations are 

not correspond exactly to the prop obtained in a random sample. 

MM rrs -3% "j. 

^ ° f freedo “ ot a 

complete 5x5 table into 

Regions . ^ 

Varieties ... ••• ^ 

Regions X varieties 16 

The reason is «... regions "Zf’rZ^S Stf 

numbers of fid* in «h« dii ''”lrSottnpS«^T»d in particular ^ 

non-orthogonal material is inhertm J rieties C an P 0 nly be obtained by rather 
interaction componentregionX ^ iven by t he subtraction of 

elaborate calculation (Yates, 1934 )• ieties (6) from the sum of squares 

the sums of squares for regions 1 and varieties (j ^ (7) 

for all regional-varietal sub-clas , ( ) e regional-varietal classification 

All the components of errors of varietal 

must be eliminated when calcl j la 8. Qr of P re „ iona l differences freed from 
differences freed from regiona o ncerned only with the component of 

varietal effects, since we standard errors have already been discussed 

variance within sub-classes. Th romoone nt can be obtained by splitting 

in Example 7-5- The mthm-sub.-cta«com]P£3^a»d withi „ varieties and 
the sum of squares tvithm each r«g>on ‘“» and (3) of Table 7.7.= ; 

b^dohig^the'same'for^re^ons ^^^^^jj^^mj^onal-variriai* sub-classes! 

Sr gi ve ,h. - „„ * 

squares (880 degrees of r ldmselfthe various sums of squares (other 

The reader should calculate for him _ These can be obtained 

than the total SU f m b^For” tL purpose 'the means require to be 

Sc^telTa^heater'number of decimal places than those given in 

™£ - - * Td'MrMMMSM 

calculation of the standar erro K> P There ig als0 a fundamental difference 
obtained from the data of Tabl . ■ • • t tbe regional means and 

in the nature of the errors. answer the question: 

varietal means are genuine sampling errors. y 

riivenTn 6 Table < 5 n 23 >r t>5 WO Ttese >r are S due a to reduction'data from a table in 
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SECT. 7,8 SAMPUNG METHODS FOR CENSUSES AND SURVEYS 

much may the*sampt?regSaJ 1 (or regioas ’. b >' how 

the corresponding means over all fields ? If t hp S 6 e 7 pec * ed to dev >ate from 

r re ’ uire F“ ion were ,i,se 

error due to chance variations betweet the St <M ^ ea J ikd 7 *> be in 
grown ? This is not a sampling error Tt ^ -T T hlch the Reties are 
represented all the potatoes fcthe‘L? 8 ** the data «>««ted 
sampling should therefore not be applied ^ ^ COrrectlon for finite 

assumption thrt ° f error is based on the 

- region , is random, and different fields »*** 

distributed. This may be far fS T V g ? Wth ’ etc ” are 3180 randomly 
ri^tly pr wrpngly, as partioil^lv suable f ^ ? !»** * » regarded^ 
on poor soils, and its yield will be less for thi* 0 ™ S<>1 S ’ a tend to be g r °wn 
to, ; be grown by the more progresS Se T A neW wiU tend 

higher yields, even though Abetter thaiTthe’ S n “ y .“ conset l uence give 
the estimates of error merely provide lower V ° ^ 7”^^ Consequently 
words they represent the erroSbu^ ’S t0 T&A ’ in other 
of variance only. Consequentlv i A 0 resic *ual random component 
•conclusions must be tentative emphasized in Section 5.23, all 

tentative. Only experiments can give definite answers. 

7.8 Ratio method : random sample 

- ’"f C ^~ P STJf it ST^ - ».ue, 

formula 7.5.k, with the additional * 1 be taken lnto account. From 

.population, we obtain for a random ^mpk^ tCrm ^ a,lowance for finite 

V (?) = j—/ F2 (Y (y) __ 2 coy (, xy ) y fa) x 

n \ y 2 xy J 

** 

the approach to more^^mplicated ^aseT'^' t*™’ Which somewhat simplifies 

denote by Q the sum of the Squares of the dT-V® Str f 1 1 fied , sa mples. If we 
.given by the ratio line (OMD of Fig. 6 8) ^e^havif ^ 7 ' S fr ° m tbe values 

Q ~ $ (y — fv) 2 

= S{(y~y) — r( X -x)¥ 

= S (y 2 ) — 2r S (xy) + r 2 S (x 2 ) 

the last 

' ,„o expressions being .hose which are suitablll for 




ESTIMATION OF THE SAMPLING ERROR sect wj g. 

.he It tvr”* ,he “ ,imated —^ ***» 

•V- - ^ Q (n - I ) 


We then find 


V(r) = 
V (Y) = 
V(y) = 




W. 

nx- 

X 2 


. w 2 

; (1 —f)ns q z 

-f 2 
- V 


The fim of the above formula; is equivalent to formula 7 5a 
The analogy with the nf ^ a t , a ' * D • g* 

information can now be seen. Aoart W1 ^?i lt su PP le mentary 

approximately unity the onlv t. ^ actor x /^ 2 > which will be 

deflations o/y71 £ “ ‘T ‘T ° f ,he *1“““ » f 

•*■=» * “ ° f ** 

* hy; i a »":rif*- ta - * - be ^ 

be sub/eT Tf? “ Um “ *■ » f * "ill 

variance of y will be ampling is random for both phases the 


V(y) 


(, __ M *t* 

l wj * 2 2 * 


V + 


Wt 


TiphJS Id ?isle P !'Tf' " C f Ulattd “ ab »- *»> the 
second-phase “2 “ * he t0 “‘ ™““ of ^ ^ calculated from Ihe 

varilt 2/Te°to , 4et'T„d f0 T Ia T “ * b « sampling 

(regarded as without error! S , “”P 1,n 8 » f *<“ fit*-phase sample 

variance of oTTSe SclttT ? th V fi ”“-Pb“« sampling 

for all units of the first-phase sample This e°btairiedif y were determined 
a general method of erro^^l'T, 0 ” 1?*““ pr ‘ ,vides 

component of variance » hich will KST ST SeST ? ' 1 “’ S "° nd 

of f Orni,,i0 " » out 

all units of the population !Lse values It 5, ^ * ° f * are known f «r 

the selected units. If this is done bias n u USC • Y* ! ke calcu lation of x from 

= 1 ^;^ ifrr wil1 be co ^ as 

Paris: 

or regression method of estimation can be use! ; 
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SECT. 7.8 SAMPLING METHODS FOR CENSUSES AND SURVEYS 

so included, for which the appropriate method of estimation without 

■"fCS ‘"ThaTa .he variance'V* M of a fo, fixed . J »„»»* 
o,e TlS. Lge of values of «, and if , itself exhib.ts » W^-crto 
ranee V (r) may be estimated from formula 7.5.h and 7 5.e ^bstitutmg 
f g ’ r for V This method of estimation has the advantage of saving 

« no'Slid. An example of the method is given fo, a strayed sample 

^ W fc virtually constant, the sampling error of any ratio ^ 

be rapidly ’calculated once the value of Mr) has been Jabhshed, «t»only 

c / r \ S (x^\ require to be known, formula / • ® t K £ i 

ff ( V (r) is inversei? proportional to «, «. equal to A/*, formula 

used, only S’ (#) blhfg required. The effecUv. —yrf 

can be most simply established by calculating ( ) , fmm 

Z vari Z batches" of data and calculating the resultant values of A from 

formula T.b.f. This is in general preferable to using formulae 7.5.1 and 

7.5.f directly. 

Example 7.8.a 1 £ . 

Estimate the sampling error of the ratio estimate of the acreage of wheat 
from the random sample of farms (Example 6.9.a). 


: 902,958 
= -3044860 


S (a 2 ) 

r 2 


We have 

S (/) = 207,261 S (xy) 

, p = -1522430 2r 

Q = 49,643 

S.E. (r) = V{(1 - J /20) 125 X 400-35} 


r 5,061,734 
= -0231779 


Sq 


? = 400-35 


15,114 

S.E. (Y) = 273,074 X 0-01443 = ± 3,940 


± 0-01443 


Example 7.8.b , 

Estimate the sampling error of the estimates of Example 6.9.b. 


We have 

n = 43 f 

S (y) = 799 y = 18-5814 * 

S(y*) = 22,065 S (xy) 

y S (y) = 14,846-5 yS(x) 

S( v -v) 2 = 7218-5 

P = 0-233149 2r 

Q == 4023-6 sq 

S.E. (100 r) = ^ \/(0*8677 X 43 X 95-80) = ± 1-744 


43/325 = 

: 79-6977 
; 76,965 
: 63,678-5 
= 13,286-5 
= 0-466298 
■a — 95-80 


0-1323 g = 7-5581 
S (x) = 3,427 
' S (a 2 ) = 328,323 
* S (x) = 273,124-0 
S (x — x) 2 = 55,199-0 
r 2 = 0-0543585 
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SECT. 7.9 


Had the sample been a random sample of individuals the formula of 
Section 7.4 would have been applicable, giving 

S E (100 r) = 100 VC 0 - 8677 x °' 2331 x 0 ' 7669 / 3427 ) = ^ °' 673 

random sample without supplementary m orma 10 

S ( y - yfl(n - 1) = 171-87 S {x - xfl(n - 1) = 1,314-3 

S.E. (X) = 7-5581 X V (° -8677 x 43 x 1314 ‘ 3) = ± 1,673 

S.E. (Y) = 7-5581 X V ( 0‘ 8677 x 43 x 171 ' 87) = 111 6 ° 5 ' 3 

The standard error of the number present in the reserve can be calculated 

in the same manner from the sum of the squares of the deviations of (# 3')> 

which in turn can be calculated directly from the separate 
Tn the nresent case, where the separate values of (x . 

and where S(x - x)(y —y) has already been calculated, it is more convemen 

squares of deviations frore the su m s of .qu^es 

and products already calculated (see Section 7.5). Thus 
O ((x - V )-(ji - f)}* = 55,199-0 + 7,218-5 - 2 X 13,286-5 = 35,844-5 

S.E. (X - Y) = 7-5581 X V{0' 8677 x 43 x 35,844-5/42} = ± 1349 

Note that x and y are not independent, being derived f the same 

rvW ” «)' “ Y >. 'putting cov(XY) equal ,o 13,286.5/42 g.ve. 

the same result as above. 

7.9 Ratio method : stratified sample with uniform sampling fraction 

(a) When the ratio is assumed to be the same for all strata. 

Instead of taking the sum of squares of deviations from the general ratio 
,ine!rltions lore a series of Hues fsuallel to this h„e ant1 P = ,^ 

the points representing the strata means must e a , 
being replaced by ft t. 

Thus . 

2 : -'££?-*> *■-»•>+- * 

The sums of squares and products will be recognized as the sums of squares 
Id proTucts whhin strataf similar to those already obtained for the y variate 
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SECT! 7.10 SAMPLING METHODS FOR CENSUSES AND SURVEYS 
information^ eSt ‘ mate ° f error for a stratlfi ed sample without supplementary 
(b) ^ t 6 a n . the rati ° is P ermitted to as ™ different values for the different 
30 The common f is replaced by r, corresponding to the different strata. 




I i O i 




The divisor n — t stands, 

being analogous to that already discussed in Section 7 6 ^ Sltuatlon 

bu,4?,S X t STiT tOM,S , X - not k ”°'™ f » 1“ different 
«,»« be need for caleuiatin” 'thf^ 

b,ThL™“pZin g “”2 w““ ,he me,hod “* in ” ti0 " 


Example 7.9 

Estimate the sampling errors of the estimate of Example 6.10. 


The contributions to Q from the 
District Q ( 


1 

2 

3 


5,107-59 

1,550-71 

7,963-98 


six districts are 
District 
4 

5 & 7 
6 


Qi 

20,566-56 

3,737-14 

1,080-92 


Hence 


S.E. (Y) 


273,074 Iff 
15,114 V { (/ 


Total 40,006-90 


1 \ 19 _ 40,006-90 

20) 


3,610 


7.10 Ratio method : stratified sample with variable sampling 

of *?^i”aL iS ,‘if” d dlffere “ T * 1 ” S in *be 


fraction 

variance 


V ( Y ) - 2 £ {jS , j, (1 fi) tli Sq? J 

= 2 {gi 2 (1 — ft) m % 2 } approximately, 

V (f) = V (Y)/X 2 X 
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SECT. 7.11 


Here the % 2 are estimated separately for each stratum, using the value of the 
ratio appropriate to the stratum and the divisor m — 1 for Qi. 

If the ratio is assumed to be the same for all strata the same formula may 
be used, with the exception that the Qi are calculated using the general ratio, 
with divisors m — 1 as before. 

For an illustration of the application of these formulae see Example 7.17. 

7,11 Ratio method : integral values of the supplementary variate 

When the supplementary variate x can only assume small integral values 
the above formulae for Q can be simplified by classifying the data according 
to the value of x . The most common instance in censuses and surveys is in 
surveys of human populations in which the sampling units are households and 
information is required on individuals. 

In the analysis of data appertaining to individuals the working unit in 
the analysis will commonly be the individual, although the sampling unit is 
the household. Clear distinction must therefore be made between the values y 
for the sampling units which in households of two, for instance, will consist 
of the totals of pairs of individuals, and the values for the individuals. These 
latter values we will denote by z, with the convention that [z] for families 
of more than one unit represents the total of the individuals in this family, 
so that [z] equals y . With this notation r = 5. Suffices will be used to indicate 
size of family ; n ly n 2) . . . to denote the numbers of families of the different 
sizes. 

No difficulty should be found in transforming the formulae for Q into 
a form suitable for computation. In the case of a random sample, for instance, 
we find 

Q-$i(y 2 ) + $ 2 (y*)+ . . . -2r{5,00+25,00+ . . . } 

+ r 2 (n 1 + in 2 + ... ) 

^S 1 (z^) + S 2 [zf+ . . . -2 z{S 1 (z) + 2S 2 (z)+ . . . } 

. ,+ # 2 (% + 4a 2 + 

= (z - i*) 2 + S 2 ([*] - 2i 2 ) 2 +...+*!(*! — ~zf 

+ in 2 (z 2 — zf + . . . 

It will be noted that in order to calculate Q the quantities [z] are required. 
In the event of the survey material being recorded on punched cards, each 
individual will normally be assigned to a separate card. The required totals 
can then be obtained on the tabulator by sorting for family designation, family 
size, and any stratification which is required, and controlling on family 
designation, either printing the totals or reproducing them directly on to new 
family cards. The latter procedure will be advantageous if further analysis 
is required on the characteristics of families regarded as entities. 

This type of analysis can be confined to a special type of individual, such 
as adult males. If punched cards are used a card count will have to be 
introduced to count the number of the , special type occurring in each family, 
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which may be termed the “ partial size ” of the family. There is then the 
minor complication that families of different partial sizes will occur together 
in the tabulated results. This is overcome if the results are reproduced on 
cards, as the “ partial sizes ” can be punched on the cards, and the cards 
subsequently sorted by partial size and listed. 

The last of the above forms for Q separates the various components of 
variance. The first term gives the contribution to Q arising from variation 
between families of one, the second that between families of two, etc., and the 
first term of the second set gives the contribution due to the average deviation 
of families of one from the general mean, etc. If the sample were stratified 
for size of family the second set of terms would be omitted. They will also 
be omitted if the error of a mean standardized for distribution of family size 
is required. 

The above formula for Q can be set out in analysis of variance form, as 
in Table 7.11. 


Table 7.11— Analysis of variance form for Q for integral values of x 


Between families of size 1 
„ „ „ „ 2 

„ „ „ „ 3 


Degrees 
of freedom 

n x — 1 

n 2 — 1 

w 3 1 


Sum of squares 
Si(* “ *i ) 2 

S,(M - 2i 2 ) 2 

s.(M - £ 3 ) 2 


Between means of families of different 

sizes. t — 1 nJjs-L — l) 2 + 4n 2 (z 2 — ^) 2 + . . . 

The mean square of the first line then gives the estimate s ± 2 of the error 
variance of families of size 1, the mean square of the second line, divided 
by 4, the estimate of the error variance of family means of size 2, etc. The 
means of families of a given size can thus be compared for different parts of 
the population, remembering that the further divisors for s 2 2 , etc., depend 
on the numbers of families and not individuals entering into the means. 

7.12 Regression method : random sample 

The estimation of error in the regression method follows much the same 
lines as in the ratio method. The sum of squares of deviations from the ratio 
line is replaced by the sum of squares of deviations from the regression line, 
and the divisor n — 2 is used instead of n — 1, since an additional degree 
of freedom is accounted for by the fact that the regression line not only passes 
through the mean point, but has its slope determined independently from the 
data. 1 
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by the'equation ^ ^ ° f the de ^io„s f rom the regression ^ 

Q = S(y ~y t f 

^ (y y) 2 b S (x (y — y') 

~ S( V — v) 2 — jf _( x ~ x) (y — v Us 

so that 

and, if errors in b are neglected = 

v (y) 

The error variance of b if ” 

or o, if the vanance of y f or v • 

/ hxed * 1S constant, is 

V(b) = — 

^ue y 0 of y fZ^Zul ^of ZT mW V ™ ce * a standardized 



-f 


V(y„) 


r i 


+ 


v^2 


^SSSsr? *■* s,a “ 


and 


2b 0 S {x ~~ x) (y 


~f) + K 


S{x ~ xf 




T’e- "i = Q!{n — 1) 

closely "to b t USed ^? r standa cdization unle2 his kn^ “ h valu * *o 

~.A t s ‘r * ^ T^r^rz 
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SECT. 7.12 SAMPLING METHODS FOK 

Example- 7.12.* Example 6 12 . a ; 

Estimate the errors of the estimar 

We have = _ 0 . 19316 x 624,739 = 41,229 

y ,2 = 0/123 = 359-59 9.733 =(+ 1-653) 2 

V (ft = (1 - 1/20) 359 -59 / 125 = 2 733 

S E (Y) = 2496 X 1-653 = ± 4126 
’ w m i 59 ! 9 - = 0 - 0001 H 2 = (± 0 - 01054) 2 

V (®) = 3,234,270 

— - *•«— * TOl ”“ ° f ““ ° £ ^ 
lS as'Tn ire 1 *' for the teate ' 

Wh “ >, “or i ^e- 2 xe 2 ,oe9- t -8^« = 93,4 M 

s , = 93,424/24 3893 resultant standard errors 

Th , values of d. «- ’ 

of the various estimates, are as 


Sample plots only 
Ratio method 

Regression, 6 0 = 
Regression, b ■■ 
Regression, b 0 


•6327 

•55 


Variance 
per unit 

. 4,803 

. 4,230 

. 3,893 

. 3,579 

3,454 


S.E. (total volume) 
710,000 cu. ft. 
£ 602,000 cn. ft. 
-i- 639,000 cn. ft. 
-j- 613,000 cn. ft. 
603,000 cn. ft. 


Relative 

efficiency 

72 

82 

89 

96 

100 


The relative efficiency of the various methods a^ 

i rearession coefficient ne T-fere there is an 

»of—*>" -- a “ utacy ' H 
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?z r is\x Sei: i c tS r :t g to : hange in the ^ 

small differences in the regression c ffl ^ attempting t0 take account of 
« to determine 4 vely L™®, 01 ''" for PMs of the data, 

will give a satisfactory adjustment ’ "““"“My near the correct value 

S‘E- (■* ~ j 7 ) =-\/(3893/25) = d-12-5. ‘ 

The actual difference — ik.r ;» l .00 *■ 
data therefore do not by themselvtW t ** Standard error - and these 

existence of bias in the eye estimates tE^V C0ndu , sive evidenc e of the 

standard error gives limits to the bias of f ° f ± twice the 

cent, and + 7 per cent. ~ 40 ' 3 " nd + 9 ' 7 - - 27 per 

of r^ro^unfty to" hTsiidan^ eiTor^s T 19~ ^ ° f ^ deviati °n 

of 1-22 for x ~y, 1 ' 19 ’ which compares with the value 

•As mentioned in Examole 6 19 H Ao 

survey confirmed the existence of h; a « b ’ h m °? extensive data of the full 
determination of its average 3t ^ Same time a more accurate 

woodland. , agC magmtude “d var iation for different types of 

7.13 Re s «ssio„ «, W : shifted and balanced samp|es 

?*“*» **'*"• . for all 

As for a random sample, except that 

,he HnZriT “ in 

different strata i'" 8 fraCtl ° n> regression coefficients different for the 

hi = Si (y - y t ) ( X __ xi)/Si (x - xif 

If a pooled estimate of error is used, 


Qt 


Q/t~ 


v (hi) = sf/Si (X - Xif 
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(c) Variable sampling fraction: 

The method is similar to that outlined for t e ratio me o 

be 

will be zero because of the balancing. 

7.14 Calibration of eye estimates 

When a regression is used to 1 ^^ t 5^ pl Sl 5 ^dw 

Section 6 : 15 w th t £ h * a dSeto 'the “mpling variance of arising from the main 
sampling process, and that due to the variance of x, - * arising 

to errors in * is usually sufficiently small 
to be neglected. To a first approximation it equals 

(»! — xf V (Z>')/f>' 4 

where V (/>') 1. calculated in the » ri “^." r jT«tVXr“iec.ed 

— - 

contribution to V(?) from this source is approximately 

V (i,)/6’ ! 

A closer approximation is obtained by j'^J^TJhieb 
‘ISb^'tov'dtLTv") is the residual variance of x about the regression 

variance of *-* due to varbmce ^ 

calculated from the residual vanance V,(») of * th ““' e sub ^ mple 

weight in the mean is ^ _ n) v _ ( , )/t , 

If the «’s are weighted according to area .or other weights, the last expression 

becomes 2N 

V, (x) (S' («)\ 2 r S (a 2 ) JLilll 

\SAJ)} L{S («)) 2 + i s ' W> 2 J 

where S„ S and S' indicate sunnnation over the whole sample, over the 
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sub-sample, and over the part ef the , '* 4 

respectively. P "" “"I* not included in the sub-sample. 

Example 7.14 

• ■“““ * he Vari ”“ " * « - « — in Eaampie 

is nIX“eT““ ° f V “ a “ e dM » “«> in S' is negligibfe, ^ % _ * 

-*— eye esdmate, 
of calculating the sampling error at the fimf f farms ’ the ra tio method 

wXtd P " P “' d A fee &m a S'J ? PP “ ab,e - A “le »» 

fields on lha* farm; J d all ^ 

and (3) constitute, in the ordinarv raf * • 1 numbers. Columns m 

these tabulated values, and the wdina T® 0 ?’ the * and V values. Usini 
Wlth inclusion oi the facto!-S Z ******* the variance of a ratio* 
ESS"®# th f e ^responding componi ’ o^arif V< ■*} = ^MO.* and 

sS " ™ - - fcra - i 

pre rP ^^[^ P ^ IPa ' ;b< * o s id ^ed e ^n ^Mtion V^^^^ ^^^^h^’^estant^the 
present difficulties, since the samnlino- -‘ 5 ’ makln S use of this fact would 

- - f -- d 

«-» d « ” f » 38 wbj b rr;iS 

610 S '( a )= 1279 o,, 

^(« 2 ) = 15,172 ^^LooJo 5 l(a)= 1889 

Substitution in the formula h • 3 ’ 8 " Sl ^ = 49 ’ 071 

Fields and not farms can reasonabfvT 3 C °? ponent of variance of 0-4137 
estimates may be expected Z Z e ™ * M £ 

his is not the case with V fv ^ c * ^ independent from field to 

often sh„„ c„„s,de„r b Si o r ' he ^ »” «“^e £“' 

tne standard error 

VO *7095 + 0-4137) — j i ^ . f adjusted mean yield is r 
is that rh^ + ' ± 1*46 bushels per acre . therefore 

that due to sampling errors introduced!™ I The main source of error 
to farm. The eye estimates are shown m k ™* Qn in ^ds from farm 
give adequate differentiation between dlff • ^ aufficien tly consistent and to 

fZTT yiddS ° f th£ aS » es ^imate 

standard error of only VO -4137 = £ ^ d has 

’ The Cl ° Ser aPPrOXimati on gives the value 0-733. ‘ 
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7.15 Sampling based 

» Lifted ^ 

for a random sample, ^ = s ( , __ ,-)*/(„ _ 1) 

V(f)=^r 2 

If units selected more than ®a “p^doffs requS!^ 
“‘^"r^onlU™ »e themfore •« 

v (y) == X 2 Sr* In 

V (Y) = X 2 Sr 2 /w 

If the size of the population ^not ^^LT^quIlitative variate 
("r/irwt^ ra random sample, following the notatmn o 

Section 6.16, »\| 


.('-ON 


Hence, if A is known exactly, 
V(X)=A 2 " 


, 0 - 1 )^ 


X 2 n 0 -n 


V (Y) = X 2 V (r) + f 2 V (X) 

. . _ Kht 
If h » not known exactly; "jjh depcndT m'the precise method of 
additional variance, the —^fwi/in general be sufficiently small 

location of the point . ’ a — w /d for A, we have 

to be neglected. Substituting A - n Q jd 

n (« 0 w ) 

vw-'sr 


^ttnate the sampling errors 

^era»"— - *“ " *" “ 

The standard error of the mem yield P“ “ 

c F (?\ = 3-5/V529 -= °' 152 cwt - 
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For th* *• sampling ERRdH 

estimates of Example 6.16. a, 
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V (X) = 25602 x 5 ~ X 7788 

. .. 8317 acres 2 


S.E.(X) 

V(Y) 

S.E.(Y) 


i 57,000 acres 
_ ^,354,000 2 
~529 ' 

: ± 45,900 tons 


/, K2 , 7788 \ 

( + 83i7 M - 7 *) 


^ 84-24 x 10 10 cw, s 


For the estimates of Example 6.16.b, 
V (X) = 640 2 x 2202 X 31.053 


842-2 X 10« acres 2 


S E riri , 3S ’ 255 

v/v'\ X == 29,000 acres 

(Y) 1,409,000 2 x 3*5 2 /59Q j i K na 

SE - 00 - ± 26,200 ,1 ' + 15 ' 7 X 842 ' 2 * !«• = 2636 X 40* cwt> 

•ource, »-c“l,'!,y l x l ““ l ° f Cr0p mt k ”«™ accurately from „ th „ 


V(Y). 
S.E. ( Y) 


1,354,000 2 


529 

i 10,300 tons 


3-5 2 cwt. ! 


.fampkTS,”,,”" ,t"^”’ d “ r r ey of thi » <W>e fill dearly be more e ffi • 

s« L‘ p r nion r y » f ^ Sr 

Xhl s point is discus^ \ g determined at the remain* F . rs > 

. H the sampleTsth thTh ^ “ Secti °* 8 . 1 7 . ^ P 0 ^ 

accurately proportional to the area of L ° P ln a di strict will not be 
If the sample points are Jh , the cro P m that district 

<» Section 7.17. «««™ned from the first-stage 

siae^f lim" W “ h " S,ra,a "•“»> Probabilities proportional 

^ based on the mean square 
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„ T1 , 

' -11 he required. Consequently 

deviations of r withio s«* ™" * 

Sr 2 == n St 


We then have 
and thus 


V (r i) = sr 2 (1 -f‘)l ni 


V (Y) = Sr 2 S {Xi 2 (1 — /i)/ n ') 

V (r) = V (Y)/X 2 

* ic Inrae it may be necessary 

aV The correction for finite sampling is n^ they arc selected- In the 

pr^lwtt*, die formnhe are appronimate only- 
^“die samp,ing errors of the estimate of 

..co^inr^"” * ^ ° ! ' lV ft * e " 

t«« ’.is-^™ 8 ” ^rr or ™:r" m»» 


Between districts 
Within districts 

Total 


Pegrees of 
freedom 

6 

10 

16 


Sum of 

squares 

*04952 

•02649 

•07601 


Mean 

square 

*008253 

*002649 

•004751 


lUIAir 

™* *-: r- «x°rr^^ - ■ - * io> 

^ *».* tes 

- *— w “ tabiy 

the sampling error. 

7.17 Multi-stage sampling the tota l sampling error 

- "-S 3 
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stages, and using the methyl f ■ R SECT, 7,17 

sufficiently small farther . ? m P lin g fraction /' a ?Z * ' 

be increased on acco U m / ? 7^ *> be neglected T'- is not 

are themselves subject to l ° f , he fact that the selected firsi ,. mpi ^ error 
in single-stage sampling Th^- 11 ^ error > instead of being kn ^ Uldt va * ues 
estimate is underT § . lncrea se in the sam „i g . 0Wn ex actly, as 

estimate resulting from the^a 011 be equal t0 /' • ° f whatever 

variance can be LlcuTated hT P SCCOnd and foiled 0 ?”“ in this 

which are sampled hi th by regardln g the selected first ° g Stages ' T his 
Thus, for 3t seeemd'and^folhf C • Un ts a « strata 

units, and n" seSlIf ge rand ° m sa «npling with 5 ^stages, 
the estimate of the sam U1UtS seJecteci from each first T CCted ^ rst ' s tage 
means of the seJoL Phng Varia nce of the firS l t3ge unit > ^ ^ is 
^ is the Valu f s the sop^Z 3^^ * 

and/' and/" are thf ^ ® econd -stage units about the fir ^ first ' sta g e units, 
variance of the me f 1 ?*" and se cond-sta^e samnfin ^ St ' Stage Unit means 

«ws m2: zzs 2:--- fi ».-s4 s Kfr 

— rf *. mean -- ;ruS s ?e unitt , 

Vfy) = 

Example 7.17 

The * 

are the successive temToTth* 0 ^ rad ° m ethod estimate at th 
given for small farms ex P re ssions for S (p"v\ a o J b f drSt sta ge 

respectively ^ mmd y ^ 3-78, 3-30 “ y) 8nd 5 <*"*), already 

The calculation of the v ’ * ’ * 

■» 

. T he values of are f ou J’ * £ the sa ™ for all strata. 

Small farms.- 27-519/21 
Medium farms: 583-81/35 
Targe farms : 2779-1 /« 

We then have, neglecting the factors 

v <0 = (105> x 22 x 1-3104 + Me x 2 s , ^ ’ 

= (±0-0302). + 59 X36X16 -'*» + ».x 0 x 347.39)/ 58 ,229. 



= 1-3104 

= 16-680 
= 347-39 
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and surveys 


sbct. ’ •« SSMM,G M “ ra ° M "* „ and ae variance a. the second 

'2SH££Z~£ " S JZ,tl 3 « « ,hem ' 

ss-V 

^ - - in 

my “ ta4 b “ 

therefore oe 

“" the estimate .£ ■ or « — 

Tanas £.n-*»«“ » oI *- “Sm 


Small farms - 
Medium farms . 
Large farms . 
Total 


Degrees of 
freedom 

21 

35 

8 

64 


Sum of 

squares 

1-1304 

1-3500 

0-9836 

34640 


Mean 

square 

0-0538 

0-0386 

0-1230 

0-0541 


Total • • r aJ « e shown 

and V (Y;) = sri 2 0 - /0 Si ( * 2) . e ^ * i n the 

where the **b wiU ^tpecdvely ^onTequent^ 

333 <3L “"££ £"* 15,245 + »»■ X 3 9 , 9 ®)/5S,»- 

■ V(fi=0.05«(105 ! x 580 + 5t) X 

= (± 0-0391) 2 . . rtV , tn : n ed 
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we obtain foe «, th „ val™?/oi f f given 

no consistent divergence from the vklues of’ t^L 7 , f ' VeIy - These sh °w 
conclude that the bias in the estimation of *7 b !f 7 1 17, and we ma > r therefore 
to be small. A more thornno-h in ?• f ° r by the se cond method is likely 

number of comparisons of the abX'tvnX ^ made by tabulating a 
data. 1C 3bove ty P e fr om various batches of similar 

of the^vlriancf tf fhTesSefrlm^h a P pro P riate for the calculation 

The sampling Mandt“eLr of ^ umn ^ Bd “ (Table 6.19.c). 

V(0-0541/67) = ± 0-0284 This stLto ^ .i " g ° Ver aI1 fields is 

due to bias, but will be appropriate T*Z J ^ n0t indude an ? e ™ 

of such a nature that the major part of the ap Pr° xi mately so, to comparisons 
large differences between sSe- e C D s t t “ ehm / nated - * there were 
within-size-groups variance or the overall^Irt 011 ° f Wbether the P °° led 
comparison in question would have to he , n f is appropriate to the 
other problems, such as how far the differ nsid ® red ~this, however, involves 

- r 1 c? "“.r b r d ” - **«— 

estimate 1 is Wcighted rati '> 

The ratio of the squares is 0-0392V0-0284a-Tgi 6 " Th^ ^h ***¥* 
number of farms, excluding those withom ,!' ThuS about double the 

the same accuracy when unbiased estimate beC -’ *** required to attain 

in a survey of this kind where the T his is Evitable 

to be proportional to the areas of the cmn h Ctl0ns ca f not be adjusted so as 
impossible when a number of crons are P bei " g . sana P led > a course which is 
the necessary informatTon is^^available d “ **“ su ^y, even if 

7.18 Systematic samples 

possible, since the units Xf nc* lEtedX^ 0 ^ ° f 3 systematic sample is 
Approximate estimates can be made in v ** random Wltbin def med strata, 
will suffice for most census and survey work T to 7^ s ™ plest - which 

into strata, and calculate the sampling error as if th^ the matenaI arbitrarily 
random from these strata P g f tbe units were selected at 

ta^Ei^ h will usually be sufficient to 

to ignore any minor and ill-defined fro,mini a treating these as strata, and 
been given (Example 7.7. a). ^ P g • An example of this has already 

points on a fine, or equa%Tpaced {besTo^- aampling ’ e S- equally spaced 
token to contain pairs of successive units so EX 3 " 3re3 ’ ^ may be 
from differences between the 
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coutribu.c, o« deg,* Of <*■ H thctc are »' such diff.renca * «>= «** 
variance per out. ,s therefore ^ ^ 

• a ? taVina alternate differences between 

Since the pairing is Mbitrary, rntend of «k 1^ equivaknt t0 miug mo 

successive units all rf,he estimate of s’ is thereby somewhat 

fn^Sghirfs^ «™8 «o 'act of indepe»d„ce betwc„ 

*‘SZZ “rSiensional-nplig ona 

the strata should consist of s«s o four umB >» ’“ n J h „o ^ 
variability in both conmbu.es 3 degree, 

5 ZSZSttZZ the^turor variance per utri. is 
^ = [S(/)-|S{Si(j)} 2 ]/3 »' 

where «' is the number of strata. . • es if ad t he lines are not of 

In the case of line sampling ^ covered by the sample is known, 

approximately equal lengt . . under consideration will be obtained 

the most accurate estimate £ \ e cdculation of the sampling error should 
by the ratio method. In this case stratified sample estimated 

- units the formuk for e 

becomes 0 = | W) - 2r . i S(4c d y ) + F a I S (dj) 

This will eliminate the varWbfl»J finVestlmaTe'is obtaine^by multiphcation 

st-sssja HSS as" “ 

“rhtsf^oflition of tb. ££££££ «£ 

5 xsr att ju m«* - . 

of error will be fully valid. samoling* the variation in 

In either systematic or strati e be j unless the boundaries 

the length between neighbouring m tl the approximate method 

~ - wi,h “*— 

‘“Throve medtods of estimation 

overestimates of the sampling error, pr sampling that there are no 

the material, and provided mlmaterial in such a 
marked strip effects running! i J ^ points falls on t he same strip. 

manner that the whole o alternative, but rather more complicated, 

If a closer estimate is required, an alternative, ou 
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SECT. 7.18 

procedure is available. In one-dimensional sampling, instead of taking successive 
differences, differences of the type 

d= — y i +y a —y i + y 6 -jv+J, -y s + \y 9 

can be taken. Such differences may be called balanced differences. Most of 
the systematic component of variation is thus eliminated. The number of 
terms included in each difference is to a certain extent arbitrary, but 9 is 
chosen as a convenient compromise. With extensive material there will be 
no need to take overlapping differences, the best procedure being to have 
overlap of the end terms only, so that the y 9 of the first difference is taken as 
the y x of the second. With this convention the sum of all the differences is 
equal to one-half the first and last included terms plus the sum of all the 
remaining odd terms minus the sum of all the even terms. The square of each 
difference contributes one degree of freedom, the divisor being given by the 
sum of the squares of the coefficients, i.e. 7-5. Consequently s 2 = S (d®)/7-5 n'. 



T-f- 


Fig. 7.18 Coefficients for calculating the error of a systematic 

TWO-DIMENSIONAL SAMPLE 

A similar procedure can be followed in the case of two-dimensional 
systematic sampling, the most convenient type of difference being that given 
by the coefficients shown ,n Fig. 7.18. Here again, the margins of the square 
covering one difference may be taken as the margins of neighbouring squares. 
The divisor m this case will be 6j. 6 ^ 

The estimates provided by balanced differences will also in general be 
overestimates of the sampling error, but may be expected to be closer than 
those based on ordinary differences. If there is no wide discrepancy between 
the two types of estimate it may be concluded that the degree of overestimation 
is not likely to be great. More exact estimates can only be obtained by taking 
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supplementary observations at intermediate points allocated either at random 
or systematically. The one-dimensional case has been discussed in detail 

by Yates (1948, A). , . . , 

The above methods of estimation can be applied both to quantitative and 
qualitative data, but in the case of qualitative data, based on either one- or 
two-dimensional point sampling, a rapid estimate of the sampling error can e 
made by using the formulae for a random sample, as in Example 7.15. This 
will tend to give greater overestimation of the sampling error than the above 
methods, but if the parts of the line or area possessing the attribute are small and 
irregularly distributed, with no great variation in density in different parts of the 
line or area, the estimate will be sufficiently good for most practical purposes. 


Example 7.18 

In the 1942 Census of Woodlands the total area of woodland shown on 
the maps was determined for each county by estimating the area of land 
coloured green on the 1-inch O.S. maps. This was done by measuring the 
total length of the E-W kilometre grid lines which fell in green areas. The 
results for O.S. sheet No. 115 covering part of Kent are given in Table 7.18. 
Estimate the sampling error of this process. 


Table 7.18—Woodland areas from line intercepts (cm.) 


Grid 

line 

Length 
of line, 

X 

Length 

coloured 

green, 

y 

Successive 

differences, 

dy 

Grid 

line 

Length 
of line, 

X 

Length 

coloured 

green, 

y 

Successive 

differences, 

dy 

98 

3*5 

0*0 


83 

30*0 

3*8 

+ 1*4 

97 

4*2 

0*9 

+ 0*9 

82 

29*4 

4*1 

+ 0*3 

96 

9*2 

0*0 

-0*9 

81 

29*1 

4*9 

+ 0*8 

95 

12*6 

0*0 

0*0 

80 

28*8 

6*0 

+ 1*1 

94 

15*5 

0*3 

+ 0*3 1 

79 

28*6 

5*4 

— 0*6 

93 

21*2 

0*1 

-0*2 

78 

28*2 

2*3 

— 3*1 

92 

25*2 

0*5 

+ 0*4 

77 

27*2 

2*9 

+ 0*6 

91 

25*4 

3*1 

+ 2*6 

76 

26*3 

2*1 

— 0*8 

90 

31*2 

2*8 

-0*3 

76' 

25*4 

6*3 

+ 4*2 

89 

34*2 

2*7 

-0*1 

74 

25*5 

8*2 

+ 1*9 

88 

34*1 

2-8 

+ 0*1 

73 

25*2 

5*4 

— 2*8 

87 

33*0 

2*6 

-0*2 

72 

24*9 

6*6 

+ 1*2 

86 

31*4 

2*3 

- 0*3 

71 

24*6 

6*6 

0*0 

85 

31*0 

3*5 

+ 1*2 

70 

20 *8 

4*1 

— 2*5 

84 

30*7 

2*4 

- 1*1 



— 







716-4 

92*7 



The successive differences dy of the lengths coloured green, y, are shown 
in the fourth and eighth columns. We find S (dy 2 ) = 63-21 and consequently, 
since there are 29 lines, 

, . ; s 2 = 4 63-21/28 = 1-1288 

S.E. {5 (y)} — \/( 2 9 s 2 ) = ± 5-72 
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estimation OF the sampling error sect> 719 

ConsequeSly t hTrSg d faL?to 1 h km ' ? preseats an area of 1 sq km 

Cm Th ^ , thC eStimated a rea in acres^f jJsS)‘^ 917 1 Iength meas “ red in 
The total area of woodland is therefore ’ 36 ° 247 ' 11 / 100 - 0()0 = 156-57. 

and rh 156 67 X ^ = 156 ' 5? X 92-7 = 1 *.514 acres 

and the standard error rf a- 

6-2 per cent. th,S area ■ 156-57 x 5-72 = ± 896 acreg 

and ^ *r - other maps covering the countv 

will give the standard error for Ae whokco 3 "? ^ reSuItant stand ^ erroS 
maps are virtually independent The 7’ T “ the elTOrs o{ the differen 

lS e eiTOr ° f 3,4 “ nt ^ g3Ve 3 P~S 

the total lengths of th^gridSa* 1 ^idth* 2Sp ° ndbg successive differences d x of 
and S (d x d y ) are required. The latter 6 ° f ! quares 3114 products S (d x 2 ) 

respectively for the map i„ qutiof n ^ t0 be 159 ‘ 23 and + 2-62 
from this map, we find P q ' U g the ratl ° p = 0-12940 derived 

V = 0/28 = 1-1643 

the two methods. The sLp^^mefood 1 ? 6 " 66 “ ' th * error caIcula ted by 
required, even when the totaf area of wooHI a11 that is really 

ofth° f r° Unty and the rati o oftheWthi IS caIcuIated from the total 
of the grid lines. If, however the fir J g ?. d green t0 the total length 

owing t° its cutting the map boundL S'a smllf mU . ch sh( f er than the rfst, 
r the length made up by takinn the rei^ Sma an §l e > >t should be omitted 
map. This trouble part of * e ** on the neighbomfog 

* ». w..ho„ stratification, „„„„ ^ £ *£> ™*»<* of 

7.19 s ampling on successive occasions 

Sections 6.21 TnTe^ ** KC ° td the variances of the estimates given in 

w Tw vrTv;? : sub - sampie - -—— 

t (y) ; + (*-2M ~ pi 2 ) V (x)}/An 

estimate of change'derfoed from foT^h ° CCasi ° n is estimated by adding the 
(j * +X ) = { v (y)-M(26~i)v (xmn 
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^ ,.19 sampling methods for censuses and surveys 

SECie ‘ , i i „„ t u e second occasion. 

(b) Two occasions only: part of the samp e rep ac 

( ) (1 - u r*) V (y) 

V (N = "* (1 - 

or, in the case of unequal numbers on the two. occasions, 

a - ur 2 ) v (y)_ 

V (yiv) = n > + n " (J. - pr*) 

.. variance of the estimate 
With equal numbers on the oc ^^ ately 
chtmge given by formula 6.21 £ ‘ "vM + VM) 

V (change) = ■ ^ _ jir) 

The variance of the estimate g -n by the^rence of the means of the 

and of d»t given be — ’ 

v(f _*) = (l-lr){V(y) + V (*)}/* 

in the last two cases are given by replacing y 
The exact expressions in the 

2 cov ( xy)l{V (*) + ^ CM 

which is equal to r when V (*) = V 00- 

came fraction replaced on each occasion. 

M S ““r” “f!Tv(?r»«bi«ct to the restrictions mentioned ,« 
The limiting value of V (yh), J 

Section 6.22, is as follows:— 

V (yh) = <? V (y)lfM 

The variance of the estimate of change given by ?»-?»-. » 

v (j, -?,-») = V' W o - f 

sampling errors of rite estimate, of Eaampie .-»• 

haJd'ttitSronTfde^"^ 

0-08767. 

We then have 

0-08767 (1 - | X <h847_) _ 0 . 0777 2 

v Gw) = _ _ . 



estimation of the sampling error 


ERROR SEr 7 on 

gam in efficiency by the use nf th ■ t . 7,20 

the fee, occasion is* 31 pTcZt ,nbmu **‘ Prided by ,be sampling on 

Similarly 

v (change) = ~ X 0 '°87 67 (1 - 0-847) 

= °' 05582 

} he variance of the change estimate e 

is 2 X 0-08767 (1 - 0-847)78, ^0-^%^ ° n both oc «sions 

, b77 ’ and the gain is therefore 8 per cent ° f these varian ces is 

the difference of a* overall means rhe “i,„”t' fra ” 

= 2x 0.06767(1 

io/ mno of ffffs variance ,o die 6* variance is 2-042, a„ d * ^ k ^ 

** ^ ss - •*- - v « - - *« 

Example 7.19.b 

Estimate the sampling e™ of the estimate, of Eaample 6.22. 

occasions the ab , Jve 

January the average number of observation* f th SampIln S errors * Excluding 
value of X is 0-664. Since r = 0 -811 ? ~r*-7w “ 9 : 8 “ d the a V^age 

* * U*343, and hence 


<P = ■—^l±_ V / f°-343 (1 — 0-657 x 0-108)1 
2 X 0-664 x 0-657 ‘ ~ 

y{y) was found to be 0-0871, and hence 
V fy A ) - °* 264 X 0-0871 

~0 t 33^T^T = °-° 8202 

V (?n ~ % - x) = iilgjg lx 0-0871 (1 - Q-8H x 0 . 746) 

o-336~>Tfo8 --■— 


= 0-254 


= 0-0729 2 


7.20 The error graph 

vari^t;: tirsr pooled estimates ° f *•— 

error variance is reasonably constant nv P r ^ st,mates are °nly legitimate if the 
the pooling is carried out/In many times of P *T °/ he P°P uIa tion for which 
exist, and in such cases, when the nu^h f ,“f tenaI such constancy does not 
for accurate determination of the erro/vf' ***?•* freedoin too small 

population other procedures must be followed^ ° f dlfferent P ar ts of the 
separately for the different parte! >f sampling errors are required 
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SKT , 20 SAMPLING METHODS BOR CENSUSES AND SURVEYS 

seci. * • u * i tc the error 

The simplest and most convenient devl f J^ ^d^aga'inst some other 
grajh. The" estimates of the to govern the 

^S^of tt o—, and a 

characteristic. d obtaine d in the course of a survey 

Fig. 7.20 shows a graph of this ki were taken and 

ot wiLorm infestation in grass fieUs. 19 degrera of freedom 

the wireworms counted m each co . ^ ^ ^ The estimates of the 

for the determination of sampling , a inst the estimated number 

percentage variance so obtained w P ^ T | e sm00 th curve so obtained 
of wireworms per acre m the va expected in similar sampling, 

was used to provide a tab e of errors that mighty ^ V ^ rf ^ inverse table 

Table 7.20 gives a small absitract giving the fiducial limits associated 

obtained by interpolation from reWor ms. This procedure is approximate 

of pbce hOT ' 

TABLE P.^DlSTBiBUEiON ^“” A “ 0 ^ EL " 

estimates of wireworm populations of oka 

4 IN. CORES v 

(1,000 per acre) 

Population which, in one- 
eighth of cases, would 
give an estimate 


True 
population 


One-eighth of sample 
estimates 

greater 
than than 


Estimated 

population 




___-r 


200 1 

105 

295 

200 

400 

260 

540 

400 

600 

428 

772 

600 

800 

597 

1,003 

800 

1,000 

1 766 

1,234 

1,000 


not less 
than that 
observed 

128 

284 

451 

624 

797 


not greater 
than that 
a observed 

325 

567 

804 

1,040 

1,277 


There are, of corme, various Sren'taSS 

of this type. In biological work t g whic h may in the material 

to other variates, such as logarithms q and thus permit 

in question be expected procedures introduce a number of 

pooling of the estimates of error, auen y 
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ESTIMATION OF THE SAMPLING ERROR SECT, 7.20 

complications ; in particular the means of the transformed variates, when 
transformed back into the original variates, will be biased. They are not 
generally necessary or advisable in sample censuses and surveys. 



Fig. 7.20 —Standard errors per unit core of 4 in. diam. 
(Wireworm Survey, 1940-1) 

o means for 2272 fields grass in 1940 - fitted to data from grass fields 

• means for 525 fields arable in 1940 - - - Poisson distribution 

Reproduced from Yates and Finney (1942, J) with the 
permission of the editors of the Annals of Applied Biology. 
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SECT. 7.21 SAMPLING METHODS FOR CENSUSES AND SURVEYS 

7.21 Sub-sampling for the estimation of error 

In an extensive survey the calculation of the sampling error from the whole 
of the data would be very laborious, and would provide estimates of error 
which are unnecessarily accurate. In order to cut down the work a sub-sample 
of the whole of the material may be taken, or estimates of error may be calculated 
for certain parts of the survey only, e.g. certain strata, with or without sub¬ 
sampling. 

A convenient method of sub-sampling, which is applicable if there are a 
large number of separate strata of approximately equal size and a pooled 
estimate of the error variance per unit is required, is to select a random pair 
of units from each stratum, and to take the differences between the two 
members of each pair. In this case each difference d contributes one 
degree of freedom. If there are t differences the estimate of a 2 is given bv 
S(d 2 )/2t. y 

If the strata are few in number and of unequal size this method is not 
applicable, since the number of differences would be inadequate and the 
different strata would not be represented in proportion to their size. In general 
it is important to see that the contributions to the error variance from the 
different parts of the population are substantially the same in the sub-sample 
as they would be if the whole of the data of the original sample were used. 
For this reason the sub-sample should in general be obtained by the use of a 
uniform sampling fraction over the whole of the original sample. A systematic 
method of selection will usually be satisfactory. 

The taking of a sub-sample in this manner is somewhat troublesome, and 
also prevents accurate comparisons of the errors of parts of the survey which 
are in themselves small and therefore inadequately represented in the sub¬ 
sample. For these reasons the more convenient method of calculating the 
sampling error for certain parts of the population only is often employed. 
This procedure will lead to inaccuracies if the variability of the omitted portions 
is different from those that are included, but these inaccuracies can be reduced 
by selecting the parts to be included on a proper random basis. Thus in the 
1942 Census of Woodlands the sampling error was calculated by selecting 
two counties at random from each of the seven regions, the data of the first 
5 per cent, sample only being used. The surveyed quarter-sheets within each 
of these counties, which were selected on a systematic grid pattern, were treated 
as if they were a random sample from all the sheets of the county. 

With grouped data the calculation of the sampling error from the whole 
of the data may well not present any appreciably greater labour than the use 
of a sub-sample, and in such cases the. whole of the data will naturally be used. 

7.22 Rounding-off and grouping errors 

If a constant grouping interval over the whole of the range is adopted, 
the additional variance per unit introduced by the grouping is ^ of the square 
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estimation OF the sampling ERROR SECTt 7<24 

by a random or systematic process such that park ^ u ♦ 

a substantially random sample of the whnlp nf ^ th b ”^? Ups COnstitut es 

«nce the differences are based on comparison. Lhin 

Table 7.23—Estimation of bias 


Small farms 


Weighted 

mean 

Unweighted 

mean 

Difference 

r-52i 

•484 

- -037. 

1*378 

•362 

- -016 

r *584 

•037 

- -047 

•417 

•< 

•446 

+ *029 

•503 

•470 

~ *033 

.-378 

•361 

- -017 

•576 

•416 

- *160 


Medium farms . 


Large farms 

.ofitr^pier^ss 0 " which overeo “ s *“*. “■»“»»» 

proportions. If the data from a rmmh e f ^ rou P® are represented in the correct 
will be necessary, "* aVaUable ’ no Subdivis i™ 

the overaU m74s fe the HTZr t en S1Ze 'g rou P mean * and between 
comparisons. dlfferent ““ ^ P r ™ d * aU necessary 

7.24 Interpenetrating samples : comparison of observers 

inte^netratin^'samples, ‘ca^froSiL^^^Th; ^ 

appropriate to each observer, and adding these v^riancfs Th 

however, is subject to certain qualifications. In the fins t nlacI t he Pr0C l- re ’ 
for finite sampling must not h P T , nrst place the correction 

components of variance must be included thich affect thT^ ^ th ° Se 
the observers. Thus if a two o i* affect t ^ le comparisons between 

selected “* 7* * 

s “"!nL°z 1 " ‘r »«-«»^ 

e*e„ sit i^ *'" > ° b “' ve " *» « K 
comparison by subdividing the material tjIhatTramto'tf Me' °* ** 

process the JSe™ ££"£ ^ T** 

primary unit separately. , lght be obtained for each 
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SECT. 7.25 SAMPLING METHODS FOR CENSUSES AND SURVEYS 

7.25 Estimation of the sampling error from duplicate staples 

If „ survey i. carried out in two or more interpenetrating parts and the 
result, are tabulated separately; an estimate of be 

obtained from the differences of the two samples-F m ” thlt at 

“"ucTSLt, »ugh. Nevertheless dtey are useful 

„ ; u P n the detailed results are not available. 3 « 

If the two samples are distinguished by single and double das es and^h 
estimate of the population total is given by the sum of the part ^ ^ 
we have Y' = Y/ + Y,' + . • • > and Y == Y^ + £ + " Ae estimate Y 

of the two samples are m t e ra . i „ Y " w ith similar 

of the population total from the two samples is AJ +^. 
expressions for Y„ etc. An unbiased estimate of the error of give y 




'T + • • •> 


where / is ,he sampling fraction for die 

f -t:=s 

variances are proportional to Yi, * 2 > etc -> 


V (Y) = 


Y M (1 -/) {0V _ Y/'^/Yx + <Y/ - Y 2 ") 2 /Y 2 




their sum for Y in the above formula. 


Example 7.25 

,» rhe 19*2 Census of Wootdand, die «- 

if9, ^—s s: £ 

mtirdi^m”^ estimste of .he .odd volume of limber 
for the country is subject. 
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sampling ERROR 

Table 7.25_Vm or sect. 7.26 

FIRST AND SECOND 1 6 B PER IN CEOT REGIONS ESTIMATED FROM 

Woodlands cent - samples of the 1942 Census of 



tHeT ? la d ? ree c ° iumns - The s Um 0 f 

the standard error of the total is 8 ’ therefore from the first formula 

The r , V(|X|X5lX 3738) = ± 29 . 0 m cy 

The sum of the last column is 20*9 j , 

standard error is ’ and therefore by the second formula the 


V(986 X | x l x 1 % x i x 20-2) 


It will be noted that an V = ± 25 '3 m. eu. ft. j 

a Xat'S £ ot3 ariS °" S 

7 * T;:x «^7 -? *• ~ 

Of Single estimates, e a 0 f fh 6 met hods of estimating the samnlino- 

«ve ^ zssrz 

C J broken down m various ways. 



3ECT . 7 .* sampling methods FOB CENSUSES and subvevs 

so that the final tables will contain a largenumber o^estim > could 

of the error variance per unit where app P ’ esent ation of the separate 

as-tt'r-ssi i. - -— 

“"tm every point of vie., <^*£££^‘‘5 SSST See. 
method of presenting the errors of the c P ^ either theoretical or 

This can be effected by making use of ^ particular component to 

empirical, which will enable the staI J da h information available in the 

° f — 1 “ d 

varS^derived frorn . “““e 3S error of any »e» 'h» 

variance per sampling unit is fraction, the error variance and the 

i Anpnf io onlv on the value of the samp g These numbers are likely in 

- =" - -- * he r: 

prepared by the use of formul t o different numbers in the P°pulat 

giving the standard errors correspond g t ^ ^ fora iula on which it » 

or sample. If a table is felt tc.^beund jy ^ ^ ukdy t0 be usedmamly 

lTS b .he P »'e»ces between T, Zy be noted 

:S“Vp1‘ q can be taken » “%i<M by tb. caa. in which 

An example of the use of anempir ^ t0 the magnitude of the 

the error variances are approximately P P in ^ ^ sec tion, » likely 

iLu of the different parts, which, as pomte 




ESTIMATION OF THE SAMPLING ERROR SECT, 7.2t> 

to hold for certain types of area survey. In this case a formula or table relating 
the errors to the entries of a table of totals can be presented. 

There is not space here to discuss more complicated cases, which must 
be dealt with on their merits as they arise. With the more elaborate types 
of sampling the possibilities for presenting the standard errors in the form 
of auxiliary tables are more limited, but even in such cases it is often possible 
to summarize the standard errors in the form of a few relatively simple formula^ 
suitable for rapid calculation on a slide rule. ; 







CHAPTER 8 


EFFICIENCY 


8.1 General remarks 


The methods described in Chapter 7 enable the sampling error associated 
with a sample of a given type and size to be calculated from the data furnished 
by the sample itself. When planning a sample census or survey, we have 
to solve the more general problem of calculating the sampling errors of samples 
of various types and sizes from the data furnished by a sample of a particular 
type and size. We can then determine which method of sampling is likely 
to be most efficient and the size of sample necessary to give the required 
accuracy. 

The determination of the sample size in the case of a random sample from 
a large population has already been discussed in Section 4.31. It was there 
shown that, for qualitative characters which are attributes of the sampling 
units, the number of units required could be determined without any prior 
knowledge of the material other than the approximate proportion of units 
possessing the given attribute in the population ; and that for quantitative 
characters knowledge of the standard deviation of the character in question 
per sampling unit was all that was required. 

The formulae of Section 4.31 apply when the population is large 
relative to the size of sample required. If the population is not large a 
correction must be made to allow for finite sampling. This is most simply 
done by calculating the number of units n 0 that would be required if the 
population were large, and the corresponding sampling fraction / 0 = n 0 /N. 
The required sampling fraction is then given by 


/ = 


/o 

1 +/o 


( 8 . 1 ) 


In this calculation f 0 may be greater than unity. 

The method followed in Section 4.31, i.e. that of taking the appropriate 
formula for the standard error of a sample of size n and rewriting this 
formula to give an equation for n, is a general one and can be applied 
to the more complicated types of sample, using the appropriate formula; 
for the standard errors given in Chapter 7. It is apparent, however, that 
these formulae can only be used if the relevant variances per sampling 
unit are known or can be estimated. In certain cases, also, the formula; 
cannot conveniently be rearranged so as to give n directly. This, however, 
is a minor point, since the required solution can always be quickly found by 
trial once estimates of the relevant variances are available. 

In the following sections we will discuss the problems that arise in the 
estimation of the variances relevant to different types of sample when the 
basic data consist of a sample of a different type. In certain cases data relating 


246 




EFFICIENCY 


SECT. 8.1 


to all the units of a population will be available. This situation does not differ 
in any essential particulars from that in which data are derived from a random 
sample of the population. 

We will here define the sense in which we shall use various terms in the 
subsequent discussion. 

The relative accuracy of two samples which differ in respect of method 
of sampling or size of sample, or both, may be defined as the reciprocal of the 
ratio of the sampling variances of the estimates provided by them. 

The relative precision of two different methods of sampling based on the 
same type of sampling unit may be defined as the reciprocal of the ratio of the 
sampling variances of the estimates given by the two methods when the same 
number of units are taken. ; 

The relative efficiency of two different methods of sampling based on the 
same type of sampling unit may be defined as the reciprocal of the ratio of the 
numbers of units required to attain a given accuracy with the two methods. 

In the case of a random sample from a large population, or a stratified 
sample with fixed strata from such a population, the relative efficiency is equal 
to the relative precision. But if the size of the strata depends on the number 
of units in the sample, or if the population is not large relative to the size of the 
sample, there is a difference between the two concepts. 

The term efficiency is already in current use in the theory of estimation. 
It is there used in an absolute sense. An estimate is efficient {i.e. has an efficiency 
of 100 per cent.) if in large samples it is one of the class of most accurate 
estimates, i.e, estimates with minimum variance. An estimate has an efficiency 
of x per cent, if it has 100/# times this minimum variance. This use of the 
term is analogous to precision in our terminology. The reason why no distinction 
has to be made between precision and efficiency in the theory of estimation is 
that only large populations are normally under consideration, in which case 
the two concepts are synonymous. Since no confusion is likely to arise, we 
shall continue to use the term efficiency when discussing the relative accuracy 
of different estimates derived from the same sample. 

The concepts of relative precision and relative efficiency may be extended 
to cover methods of sampling based on different types of sampling unit, by 
replacing numbers of units by the amount of material included in the sample. 
They may be further extended to cover the relative accuracy for a given cost 
and the relative cost for a given accuracy. 

It may be noted here that the relative precision and relative efficiency of 
different types of sampling should as far as possible be judged from estimates 
of the sampling variances derived from the same set of data. Comparisons 
based on estimates derived from independent samples of different types are 
subject to errors of estimation which are considerably larger, and comparisons 
based on samples from different aggregates of similar material are even more 
subject to uncertainty. No very general conclusions should, however, be 
drawn from a single comparison based on a small amount of data, even when 
a single set of data is used. The relative precision of stratified and random 
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samples, for instance, will depend on the differences between strata, and these 
differences may vary considerably even in apparently similar material. 

8.2 Qualitative data 

If the variates under consideration are attributes of the sampling units, the 
effect of stratification, with either uniform or variable sampling fraction, can 
be determined from a knowledge of the proportions of units possessing the 
given attribute in the different strata. In other cases qualitative variates must be 
treated similarly to quantitative variates, as in the estimation of sampling errors. 

Formulae for the required size of a stratified random sample with uniform 
sampling fraction, analogous to those for a random sample given in Section 4.31, 
can be written down without difficulty. A somewhat simpler approach, however, 
is to estimate the percentage standard error of a stratified sample of any 
convenient size (e.g. the size of the sample of which the data are available) 
on the assumption that the population is large. The size of sample required 
to give any predetermined percentage standard error is then given, if the 
population is large, by the formula 

Size of sample required __ (Actual percentage standard error) 2 . 

Size of actual sample (Required percentage standard error) 2 

Allowance for the effect of finite population size can then be made by 
formula 8.1. 

In the case of a stratified random sample with variable sampling fraction 
the same procedure can be followed, with the exception that allowance for the 
effect of finite population size cannot be made in the above manner. If, 
therefore, any of the correction factors (1 — ft) are sufficiently large to be of 
importance, the approximate size of sample required may first be calculated 
as above and the final size found by trial. Variable sampling fractions, however, 
are not likely to be much used for qualitative data. 

Example 8.2. a 

If a large population of individuals is divided into five strata containing 
equal numbers of people, determine the relative sizes of a stratified and a fully 
random sample of the same accuracy when the percentages of individuals 
giving a positive answer to a given question in the different strata are (a) 70, 
60, 50, 40 and 30 per cent, (b) 10, 7-|, 5, and 0 per cent. 

A sample of 500 people will have 100 in each stratum. The variance of 
the number in the sample giving positive answers will be, in case («), for a 
stratified sample, 

100 X -7 X -3+100X *6X *4+100 X *5 X *5+100 X *4 X -6+100 X '3 X *7=115 
and for a random sample, 

500 x -5 X -5 = 125 
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The ratio of the required sizes is therefore 125/115 = 1-087, i.e. the random 
sample will have to be 8-7 per cent, larger. In case ( b ) a similar calculation 
shows the random sample will have to be 2-7 per cent, larger. 

Example S.2,b r 

Determine from the data of Examples 6.5 and 7.6.a the numbers of 
ferns required to give a sampling standard error of 5 per cent, in the estimate 
of the number of farms growing wheat (a) when the sample is random, (b) when 
it is stratified by size-groups. 


(a) We have p = 54/125 
and 8.1 

«o = 


= 0-432. Consequently, from Sections 4.31 
10,000 x 0-568 

= 526 


0-432 x 5 2 
fo = 526/2496 = 0-210 
- 0-210 
f ~ 1 + 0-210 — °' 174 

n = 433 


(b) We have already found that for the stratified sample U = 1080 and 
S.E. (U) = ± 71-64. If the population had been large, therefore, we should 

^ = 71 ' 64 /V , ( 1 ~ 1 / 20 ) = 73 -5. Consequently in this case 
S.E. % (U) == 6-80. Hence 

_6-80 2 

125 

« 0 = 231 

fo = 231/2496 = 0-0927 
/ = 0-0927/(1 -f 0-0927) = 0-0848 ■' 

n =212 


The standard error of the total estimated from a random sample of 
125 farms is 20^(125 X 0-432 x 0-568 X 19/20) = 108-0. Consequently the 
relative precision of the stratified and random samples of 125 units (or indeed 
of any number of units) is given by 108-0 2 /71-64 2 = 2-27. The relative 
efficiency, when a 5 per cent, standard error is required, however, is 
433/212 = 2-05. The relative efficiency is slightly less than the relative 
precision because we are sampling from a finite population. 


8.3 Random sample and stratified sample with uniform s ampling 
fraction 

The general principle to be followed is to construct an analysis of variance 
which corresponds as closely as possible to that appropriate to the required 
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type of sample. The procedure varies somewhat according to the type of data 
available. 

(a) From the data of a stratified sample with uniform sampling fraction. 

An analysis of variance within and between strata in the form of Table 
7 . 7 .a must be made. The within-strata mean square V gives an estimate 
of the error variance per unit in a stratified sample, and the mean square s 
from the total line gives a similar estimate for a random sample. If separate 
estimates of the error variance per unit have been made for the different strata, 
as in Example 7.6.a, a pooled within-strata sum of squares may be calculated 
bv multiplying the within-strata error variances by the degrees of freedom 
m — 1 for each stratum, and summing the products, or by summing the sums 
of squares directly. 

The formula of Section 4.31 can then be used to determine the si2e oi 
sample, using s x 2 in place of s 2 for a stratified sample, and correcting for finite 
population in the same manner as in Section 8.1. Since for a stratified samp e 
V (yl = s, 2 (1 — /)/», and for a random sample V (y) — s 2 (1 —/)/»> the 
relative precision of stratified and random sampling will be given by the ratio 
of s 2 /*! 2 . The relative efficiency will be somewhat less than the relative precision 
when the corrections for finite sampling are appreciable. 

Th is procedure is approximate in two respects. In the first place, if the 
variances within the different strata are unequal they do not enter into the 
mean square B with quite the correct weights, as already explained m Section 7.7. 
In the second place, a stratified sample has a slightly greater overall variance 
per unit than a random sample from the same population, and consequently 
C is not the best estimate of the variance per unit of a random sample. IN either 
of these approximations gives rise to errors of any importance m the comparison 
of a random and a stratified sample, but it may be noted that the bias m C 
can be almost completely eliminated by calculating s 2 from the formula 

s 2 = {(« - 1) C + B}jn (8-3) 

An extension of* this formula is of use in the case of multiple stratification 
(Section 8 . 4). Method (c) below takes account of both sources of disturbance. 

(b) From the data of a random sample: 

An analysis of variance within and between strata can be made in the same 
manner as with a stratified sample with uniform sampling fraction, and s x 
and s 2 can be estimated as in a stratified sample. 

For this procedure it is only necessary that the units of the sample be 
classified by strata. The numbers of units of the whole population falling m 
the different strata do not require to be known. 

If these numbers are known, Method (c) below can be followed. This 
will give slightly more accurate results at the cost of a little additional 
computation, since allowance is made for the fact that the numbers m the 
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different strata in the sample will not be exactly proportional to the numbers 
in the population, owing to the fluctuations of random sampling. 

(c) From the data of a stratified sample with a variable sampling fraction 
(or any arbitrary values of the sampling fractions): 

Estimates of the average within-strata mean square s 2 2 and of the overall 
mean square must be calculated from the proportions hi — N*/N of the units 
of the population in the different strata. The formulas are 

= £ hi Si 2 

* 2 - + 2 h fi 2 - y 2 - 2 k (1 - ht) st*lm 

where y is the estimate of the population mean derived from the sample, and 
is consequently equal to S hyu The relation of these formulas to the; analysis 
of variance of a stratified sample with uniform sampling fraction will be 
apparent. The terms involving y in s 2 correspond to the between-strata 
component of variance, the last term of s 2 being the correction required because 
the yi are themselves subject to sampling error. This correction will be trivial 
except when the between-strata component of variance is small and there are 
a large number of strata with few units from each stratum. If the hi are put 
equal to mjn (uniform sampling fraction), s 2 will be the same, to order l/n t 
as that given by the mean square C of Method (a), with the exception that in 
Method (a) l&hiji 2 — y 2 is multiplied by a factor nj(n — 1). 

It will be noted that the data need not be derived from a sample in which 
the sampling fractions are chosen with the object of obtaining the most accurate 
possible estimates : any set of data in which the sampling is random within 
strata, and from which the proportions of the units in the different strata, 
the strata means and the within-strata variances can be determined with sufficient 
accuracy, will be adequate. 

Example 8.3.a 

Determine the error variances per unit and the relative precision of a stratified 
random sample with uniform sampling fraction arid a fully random sample 
from the data on wheat acreages of the stratified random sample of Hertfordshire 
farms (Examples 6.5 and 7.6.a). 

The analysis of variance is given in Table 8.3.a. The within-stfata sum 
of squares is obtained directly from Table 7.6.a by summing the column 
Si (y — yi) 2 . The between-strata sum of squares is obtained by summing 
the products of the columns of Table 6.5.b giving the totals and means, and 
deducting the product of the general total and the general mean. These 
means should be taken to two and three decimal places respectively. We 
thus have s x 2 = 349*2 and s 2 = 797*1. The estimate of the relative precision 
is therefore 2*28. 
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Table 8.3.a—A nalysis of variance of the stratified random sample of 

Hertfordshire farms 



Degrees 
of freedom 

Sum 

of squares 

Mean 

square 

Between size-groups . 

5 

57,278 


Within size-groups 

119 

41,558 

349*2 

Whole sample . 

124 

98,836 

797*1 


Example 8.3.b 

Make similar estimates to those of Example 8.3.a, using the data of the 
random sample (Examples 6.6, 7.2.b and 7.6.b). 

The analysis of variance is given in Table 8.3.b. If the N* are not 
known, we have s^ 2 = 488*5 and s 2 = 1329*9. The estimate of the relative 
precision is therefore 2*72. 

Table 8.3 .b— Analysis of variance of the random sample of Hertfordshire 

farms 



Degrees 
of freedom 

Sum 

of squares 

Mean 

square 

Between size-groups . 

5 

106,775 


Within size-groups 

119 

58,129 

j 

488*5 

Whole sample . 

124 

164,904 

1,329*9 


If the Nj are known the calculations follow the same lines as those of 
Example 8.3. c below, and are left to the reader. In this case we find s x 2 = 436 *5 
and s 2 = 1189*2, the estimate of the relative precision being again 2*72. 

Example 8.3.c 

Make similar estimates to those of Example 8.3.a, using the data of the 
sample with variable sampling fraction (Examples 6.7 and 7.7.a). 

Table 8.3. c shows the calculations. The hi are calculated from the numbers 
in the population. These are given in Table 6.6.b, except for the last two 
size-groups, which have the values 215 and 51 respectively. It will be noted 
that we are here considering a sample stratified for districts as well as size-groups. 
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Table 8.3.c—Calculation of the average within-strata and overall 
MEAN SQUARES FROM THE STRATIFIED SAMPLE WITH VARIABLE SAMPLING 
FRACTION OF HERTFORDSHIRE FARMS 


Size-group 

h t 

fii 

n 

y* % 

Si 2 

*<( 1 —*()*<*/«< 

1- 5 

•174 

0 

(0) 

(0) 

(0) 

(0) 

6- 20 

•208 

3 

0 

0 

0 

0 

21- 50 

•143 

6 

4-5 

20 

53*5 

11 

51-150 

•208 

26 

8-2 

67 

159*2 

1-0 

151-300 

•160 

40 

29*1 

847 

564-2 

1*9 

301-500 

•086 

43 

76*6 

5,868 

1,703 

3-1 

501- 

•020 

17 

172-1 

29,618 

2,614 

3-0 


*999 

135 

17-03 

1,249*3 

329*8 

10*1 

301-500 

*811 

43 

76-6 

5,868 

1,703 

6*1 

501- 

•189 

17 

172*1 

29,618 

2,614 

23-5 


1*000 

60 

94-65 

10,357 

1,875 

29-6 

301- 

•106 

60 

94-6 

8,959 

3,243 

5-1 


*999 

135 

17*03 

1,102*0 

474-8 

9*1 


The sums of the products of hi with yi, yp and Si % are shown at the foot 
of their respective columns. We therefore have, since 17*03 2 = 290*0, 

* ^2 = 329*8 

*2 = 329*8 + 1249*3 - 290*0 - 10*1 = 1279*0. 

Hence the relative precision is 1279*0/329*8 = 3*88. It will be seen that the 
corrections in the last column are here trivial, and could well be omitted. 

Size-groups 301-500 and 501- can be combined in the manner shown in 
the second part of Table 8.3.c. We have, for these two size-groups combined, 
s 2 = 1875 + 10,357 - 8959 - 30 = 3243 

We can now insert a fresh line in Table 8.3.c to replace the lines for the 
last two size-groups in the first part of the table. The previous computation 
is then repeated, giving 

sj = 474*8 

s 2 = 474*8 + 1102*0 - 290*0 - 9*1 = 1277*7 
Hence the relative precision is 2*69. 

The amalgamation of the two size-groups containing the largest farms 
has resulted in a considerable loss of precision, the relative precision being 
329*8/474*8 = 0*69. 
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8.4 Multiple stratification 


The gain in precision due to sub-stratification of a sample which is already 
stratified into main strata can be estimated by methods similar to that of 
Section 8.3. An example has already been given in Example 8.3.c, where 
the gain in precision resulting from the subdivision of the size-group 301— 
into two groups, 301-500 and 501- was determined. 

If the data are derived from a sample with uniform sampling fraction 
which is itself sub-stratified, the comparisons can be made directly between 
the relevant mean squares in the analysis of variance, as in Method ( a ). The 
structure of the analysis of variance in this case is 


f Betweep main strata 


Whole sample (s 2 ) 


Within main strata 


(*i 2 ) 


/Between sub-strata 
/Within sub-strata (s 2 2 ) 


The ratio of the mean squares s ± 2 and s 2 2 within main strata and within sub¬ 
strata will give the required relative precision. 

A similar analysis can be constructed from data derived from other types 
of sample with uniform sampling fraction (Method (6)). 

One case of practical importance is that in which both the main and 
sub-strata are arbitrary subdivisions of an area, all the main and all the sub¬ 
strata being of equal size. If there are t' main strata, and t” sub-strata per 
main stratum, with k selected sampling units per sub-stratum, the analysis 
of variance will be of the form shown in Table 8.4. 


Table 8.4— Structure of the analysis of variance in a 

DOUBLE STRATIFICATION 


Between main strata 

["Between sub-strata 
Within main strata < Within sub-strata 
[Total 

Total for sample .... 


Degrees 

Mean 

of freedom 

square 

t f - 1 

* A 

f (*" - 1) 

D 

f t " (k - 1) 

E = s 2 2 

f (t" h - I) 

B = s / 

t't" k — 1 

C = s 2 


If k is small the bias in the estimate s x 2 provided by the within-strata mean 
square may be appreciable. This bias can be almost completely eliminated 
by using the formula 

Sl 2 = {(t"k-1)B+ E}lt"k 
which is derived directly from formula 8.3. 


8.5 Stratified sample with variable sampling fraction 

In the notation of Section 8.3, we have 

NV(y ) = Xsi 2 hi(l-fi)lfi 

The m or ft required for a given accuracy can only be determined uniquely 
from this equation if the relations between the different fi have been decided. 
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It has already been pointed out (Section 3.5) that for maximum accuracy the 
fi should be proportional to o/, but that in many types of material stratified 
by size-groups the fi may be taken proportional to the mean sizes of the size- 
groups. If we put fi = c Xi, where the Xi are in the required proportions, 
the above equation can be written 

is(^A//A/) = NV(y)+i^A, 

The value of c for any required accuracy can then be calculated. If, however, 
a value of c is obtained which makes some of the// greater than 1 the calculation 
must be repeated, omitting the terms for these strata from both sides of the 
equation. 

Alternatively the direct expression for V (y) can be used and the value of c 
found by trial. This has the advantage that the effect of adjustments of the 
final sampling fractions to simple fractions is immediately apparent. 

The relative precision of stratified samples with variable and with uniform 
sampling fractions can be obtained by calculating V (y) for both samples. 
It should be noted that if the fi have been taken proportional to the Si a slight 
over-estimate of the relative precision will be obtained, owing to errors in the 
f*- This P oint has been discussed by Sukhatme (1935, A), but is not of great 
importance in practice. 

It will be seen that for these calculations we only require sufficiently accurate 
estimates of the variances within strata and the proportions of units of the 
population in the different strata. The procedure is therefore the same whether 
or not the sample from which the data are obtained is stratified. All that is 
required is that all strata should be adequately represented. 

Example 8.5.a 

From the data of Table 8.3. c determine the size of sample required to give 
a standard error of ^ 1500 acres in the estimate of wheat acreage, when 
sampling fractions proportional to those of Table 3.7.a are used. 

The Xi can be taken equal to the sampling fractions of Table 3.7.a. 
Tabulating si 2 hi and Si 2 hij Xi, we find 

S (st* hi/Xi) — 2913 S si 2 hi = 329 -77 

Also N V (y) = V (Y)/N « 1500 2 /2496 - 90144, and hence 

c = 2913/(90144 + 329*77} = 2*37 

The total number required in the sample is therefore 135 x 2*37 — 320, 
the number in the largest size-group, for example, being 51 x 2*37/3 = 4o! 

No sampling fraction is greater than 1, and therefore no further computation 
is required. 

In practice the new sampling fractions may well be rounded off, taking, 
for example, all of the largest size-group, | of the next, etc. 
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The direct approach illustrates the way in which results of this kind can 
readily be obtained by interpolation. A first approximation to the number 
required is given by 135 X 2550*/1500* = 390. The standard error 
corresponding to this sample number, obtained by the ordinary methods, is 

1302. The squares of the reciprocals of this and of the original standard 
errors can be plotted against the respective numbers in the samples and a 
smooth curve drawn through these two points and the origin. This curve 
gives the general relation, between sample size and accuracy, and will be found 
to give a sample number corresponding to a standard error of ± 1500 of 
approximately 320. 

Example 8.5.b 

Determine the relative precision of the sample of Table 3.7.a and the 
sample with sampling fractions proportional to Si containing the same number 
of farms. 

The standard error, using these sampling fractions, can be calculated in 
the ordinary manner, and is found to be i 2420. The relative precision is 
therefore 2420 2 /2550 2 = 0*90. There is consequently an apparent loss of 
precision of approximately 10 per cent., but the real loss is likely to be less 
than this, owing to errors in the estimates of the standard errors. 

This apparent loss refers to a single variate, acreage of wheat. If, for 
instance, the acreage of some other crop were taken, the a; would be different 
and the sampling fractions required to give minimum variance would therefore 
also be different. Consequently, if several variates have to be determined, 
a compromise will in any case be required. 


8.6 Supplementary information 

The determination of the number of units required in a sample when 
supplementary information is available presents no essentially new problems. 
It has been shown in Chapter 7 that apart from the substitution of Sq 2 or s t 
for s 2 the formulae for the variances of estimates based on supplementary 
information differ little from those for estimates from similar samples without 
supplementary information. Consequently it will usually be sufficient to 
estimate the appropriate variance by the methods given in Chapter 7, using 
this variance instead of the ordinary variance per unit to determine the size 
of sample. The factor x 2 /x 2 in the variance of the ratio estimate differs from 
unity only because of sampling fluctuations in x 9 and can be omitted. 

When the ratio method is to be used and Vx (r) is virtually constant for all x y 
it will often be advantageous to estimate this variance rather than s q 2 . This 
will generally lead to somewhat simpler and more straightforward computations. 
Any slight bias introduced into the estimates of error will be of little consequence, 
since it will merely result in a slightly larger or smaller sample being taken. 
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We frequently require an estimate of the gain in precision due to the use 
of supplementary information. This is needed in planning a sample survey 

be taken deC Tt 10n t0 b * reached whether supplementary observations should 
i e ^. ken ; J 18 also r ^ u,red ln ^ planning of the computations in order to 

the addTi? Y C UtlIlZati 0 f 1 °J aV3ilable su PP^ eme ntary information is worth 
the additional computational labour. 

, ^ , th ? C3SC of * e egression method the relative precision is very simply 

belngt^r * PCndS ° nIy ° n ^ ^ ° f Ae ? ° rrdati0n coitS? 


strariS S ,h ^ T St bC hEd t0 an y restrictions imposed by 
(ratification the same sums of squares and products being used as in the 

calculation of the regression coefficient and the residual error. The above 
expression is approximate in that the reduction by 1 of the error degrees of 

rdathTm re g ressi ™ has been ignored, but this correction will be small 

relative to errors in the estimation of t . 

precision S'b? ^ *° ° f ^ regre8Si ° n COefficient is used the relative 


1 — r 2 + r 2 (1 — b 0 Jb) 2 

Thcorresponding expression for the ratio method is obtained by writing 

Example 8.6.a 

From the data of Example 7.17 calculate (a) the number of farms required 
to give an unbiased estimate of the mean dressing of nitrogen per acre over 
the farms of the county with a standard error of ± 0-05 cwt.f and (b) the 
number of farms required in each of two equal groups so that the comparison 
based on the unweighted means of the dressing per acre of the tw7groups 
has a standard error of ± 0-05 cwt. g P 

(a) The required number is 67 x 0-0392 2 /0-05 2 == 41. The correction for 

used t Samp - mg 18 t " vlal in t this example. Note that either tg * or & can be 
used to arrive at this result. 9 De 

of the If the requ ’ red nun }her in each group is n, the variance of the difference 
of the means is 2 s r 2 /n. Hence n — 2 x 0 -0541/0-05 2 = 43. 

Example 8.6.b 

fa °“ n ,'btxi o r for ,he rela,i,e d5c “ 


We have b — 0-6327, 
r = 147-36/132-08 = 1-116. 


r = 52,069/V(115,266 X 82,296) = 0-537 and 
Hence the relative precision of the regression 
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method, compared with the sample plots only, is 1/(1 - 0-537 2 ) = 1-40 
Similarly the use of differences (b 0 = 1) gives a . relatl T e ^ 

1/(1 _ 0-537 2 + 0-537 2 (1 - 1/0-6327) 2 } = 1-23, the ratio method (i„ = 1/11 b) 
gives a value of 1-14, and the regression (b = 0-55) gives a value of \' 39 - T ese 
correspond to the relative efficiencies already tabulated except m the case 
of the regression, for which we have here neglected the correction for degrees 

of freedom. 

Example 8.6.c 

Determine the gains in precision in the estimation of wheat acreages from 
the random sample of Hertfordshire farms due to the use of supplementary 
information on acreages of crops and grass, (a) using the ratio method, an 
(b) rising the regression method, without taking account of districts. 

The standard errors, already obtained, are ± 7950 for direct estimation 
without the use of supplementary information (Example 7.2.b), ± 3940 tor 
the ratio method (Example 7.8.a), and ±4126 for the regression method 
(Example 7.12.a). The apparent gain in precision due to the ratio metho 
is therefore 7950 2 /3940 2 = 4-07, and that due to the regression method is 

7950 2 /4126 2 = 3-71. . , 

The value for the regression appears anomalous, since the formulae given 

above indicate that regression may be expected to be at least as efficient (apart 
from the change in degrees of freedom) as the ratio method. The discrepancy 
is. due to the inclusion of the factor x 2 /* 2 in the variance of the ratio estimate 
Using the above formulae with r = 0-8555, b - 0-1932, r - 0-15 , we n 

that the relative precision, compared with direct estimation, is 3-73 tor the 
regression method, and 3-32 for the ratio method. An alternative estimate o 
the relative precision of the regression and the ratio methods is thereore 
3-73/3*32 = 1*12. This latter value gives a better indication of the averag 
value of the relative precision of the two methods. 


8.7 Two-phase sampling 

The only case which presents any new features is that in which the first- 
phase information is used as supplementary information to improve the 
accuracy of estimates of the second-phase variate y. It has already been pointed 
out in Section 7.8 that the variance of a two-phase sample is m this case made 
up of two parts A and B, where 

A = variance due to the first-phase sampling, i.e. the variance which would 
be obtained if y were determined for all the units of the first-phase 
sample, 

B = variance due to the second-phase sampling of the first-phase sample 
(regarded as without error). 
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To determine B the methods given in Chapter 7 for supplementary 
information are followed, the effective sampling fraction being «>,• lo 
determine A we must use the methods given in the present chapter for the 
evaluation of the error of a sample of one size and type from the data o a 
sample of a different size and possibly different type. Thus, if the first-phase 
sampling is random, and the second-phase sampling is stratified with a yarn e 
sampling fraction, it is necessary to calculate the variance of an unstrati ed 
random sample of « x units from the data of a stratified sample with variable 
sampling fraction of n 2 units. 

Once A and B have been determined the calculation of the relative precision 
of different possible sampling methods presents no difficulty. If, for example, 
we wish to ascertain the increase in precision due to taking a two-phase sample 
of % and n 2 units instead of a single-phase sample of w 2 units, we calculate 
what the variance A' of a sample of « 2 units would be if the first-phase sampling 
procedure were followed for a sample of » 2 units. This calculation will follow 
the same lines as that of A. The relative precision is then A I (A + B). 
Similarly the relative precision resulting from the ascertainment of the second- 
phase information on the n 2 second-phase units only, instead of on all the n L 

units of the sample, will be A/(A + B). .... , 

In the simple but general case in which the population is large, and the 
methods of sampling and estimation are such that the variances of the estimates 
at each phase are inversely proportional to the numbers of units, apart torn 
the factor 1 -- n 2 jn ly the above relative precisions are capable j>t simple 
expression, 
and n 2 lfii - 


x — f|, 2/ ^ - x ~ . , , 

If the effective variances per unit are s x 2 and s 2 , with s 2 js 1 


A, we have 


A 


. 1 ,. 

1 


A' 


1 




B 


«2 \ 


Consequently the relative precision giving the gain due to the inclusion of the 
additional first-phase units is 

a ’ _ i ;; 

A + B (1 — X) k 2 + A 

Similarly the loss by not ascertaining the second-phase information over all 
the first-phase units is given by the relative precision 

A _A_ 

A + B (1 — A) k 2 + A 

Representative values of these fractions are given in Table 8.7. If, for 
example, the effective standard error per unit is halved by the use of the first- 
phase supplementary information, k 2 — |. Consequently, if we introduce 
two-phase sampling and quadruple the size of the sample for first-phase 
information only, instead of using single-phase sampling, the amount of 
information derived from a second-phase unit is increased by a factor of 2-29. 
Similarly by collecting second-phase information on only \ of the first-phase 
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U J 11 ( t . S ‘” stead of a11 the units the amount of information is reduced by a factor 
ot 0*57. J 


Table 8.7-Relative precision of two-phase and single-phase sampling 


Two-phase sample:— 
Single-phase sample:— 

n x and n 2 

and n 2 
n x 

K 2 

i 

i 

4" 


i 

i 

£ 

* = i 

1-33 

1*6 

1*78 

0*67 

0*8 

0-89 

X = i 

1-6 

2*29 

2*91 

0*4 

0*57 

0*73 

X =i 

1*78 

2*91 

4*27 

0*22 

0*36 

0*53 


0.8 Sampling on successive occasions 

The relative efficiency of the various estimates can be calculated from the 
variances given in Section 7.19. When the variances on the different occasions 
are the same, the relative efficiency of the various estimates, under the conditions 
set out in Section 6.22, depends only on ju and the correlation r between the 
successive occasions. 

Table 8.8. a gives the efficiencies, relative to those of the overall mean 
ot the adjusted estimates of the mean on the last occasion (a) when there is 
a sub-sample on the second occasion, and (b) with partial replacement, the 
latter^being given for both two and a large number of occasions. Values for 
• a ~ f and I 1 = 7 and for various values of r, are given. With independent 
samples or a fixed sample the overall means are fully efficient. 

Table 8.8.a Sampling on successive occasions: efficiency, relative to 
mean, of the adjusted estimates of the mean on the last 




(*= i 

= j 

r 

Sub- 

Partial replacement 

Sub- 

Partial replacement 


sample 

Two 

Large 

sample 

Two 

Large 



occasions 

number 


occasions 

number 

0 

•25 

•5 

•6 

•7 

•8 

•9 

•95 

1*0 

1-00 

1*03 

1*14 

1*22 

1*32 

1*47 

1*68 

1*82 

2*00 

1*00 

1*02 

1*07 

Ml 

1*16 

1*24 

1*34 

1*41 

1*50 

1*00 

1*02 

1*08 

1*12 

1*20 

1*33 

1*65 

2*10 

Inf. 

1*00 

1*02 

1*09 

1*14 

1*20 

1*27 

1*37 

1*43 

1*50 

1*00 

1*02 

1*06 

1*09 

M3 

1*18 

1*25 

1*29 

1*33 

1*00 

1*03 

1*07 

Ml 

1*18 

1*30 

1*59 

2*02 

Inf. 
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Table 8.8. b—S ampling on successive occasions : efficiency, relative to 

THE DIFFERENCE OF THE OVERALL MEANS, OR TO INDEPENDENT SAMPLES 
(VALUES IN BRACKETS), OF ALTERNATIVE ESTIMATES OF CHANGE 


r 

F = 

: h 

^ = 

- 1 

3 

_ 

Fixed 
. sample 

i 

"< j 

i 

From last 
two occasions 

1 

I 

From last 
two occasions 

0 

1-00 (1-00) 

1-00 (1-00) 

1 ’00 (1 -00) 

1-00 (1*00) 

(1-00) 

■25 

1-02 (1-16) 

1-02 (1-17) 

1-02 (1-22) 

1*02 (1*22) 

(1-33) 

•5 

M0 (1-47) 

M2 (1-50) 

1-09 (1-63) 

Ml (1*67) 

(2-00) 

•6 

1-18 (1-69) 

1*22 (1-75) 

1 *15 (1-92) 

1*20 (2*00) 1 

1 (2-50) 

•7 

1-32 (2-03) 

1-41 (2-17) 

1-27 (2-37) 

1*36 (2*56) 

(3-33) / 

•8 

1-60 (2*67) 

| 1*80 (3-00) 

1 '50 (3-22) 

1-71 (3*67) 

(5-00) 

•9 

2-43 (4-41) 

3-02 (5*50) 

2-21 (5-63) 

2*80 (7*00) i 

(10-00) 

•95 

3-99 (7-01) 

5-51 (10*50) 

3-58 (9-76) 

5*01 (13*67) 

(20-00) 


The increase in precision due to the use of partial replacement instead 
of independent samples or a fixed sample can also be obtained from Table 8.8. a. 
Thus with a correlation of 0*8 replacement of half the units gives a 24 per cent, 
increase in precision on the second occasion and a 33 per cent, increase after 
a number of occasions. With one-third replacement the corresponding 
percentages are 18 and 30. 

Table 8.8.b gives similar efficiencies, relative to the differences of the 
overall means, or to independent samples (values in brackets), of the' estimates 
of change given by y h —- y h _ 1 and by the weighted estimate based on the last 
two occasions only (formula 6.21.b). 

In the estimation of change the difference between the overall means of 
Two independent samples is less accurate than the difference of the overall 
means of a sample with partial replacement. This in its turn is less accurate 
than the difference between the means of a fixed sample. Thus with a 
correlation of 0*8 the weighted estimate from the last two occasions, with 
replacement of half the units, is 3*00 times as efficient as the difference of the 
means of two independent samples, but only 1 *80 times as efficient as the 
difference of the overall means of the replacement sample. A repeated sample 
under these circumstances is 5*00 times as precise as a pair of independent 
samples. 

It will be noted that the estimate of change derived from the last two 
occasions is always somewhat more accurate than the estimate Yh — ?h-v 
With a correlation of 0*8, for instance, there is a gain in efficiency of 12 per cent* 
when ij, = h and of 14 per cent, when t a — 
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Example 8.8 

Estimate the relative efficiency of the various estimates of Examples 6.21 
and 6.22. 

In Example 6.21, r == 0*847 and pt = i. Consequently, from Table 8.8.a, 
the relative efficiency of y w and y is 1*21. From Table 8.8.b the relative 
efficiency of the weighted estimate of change and the difference of the overall 
means is about 2*1. The relative efficiency of the difference of the means of 
the units common to both occasions and the weighted estimate is given by the 
weight of the former, namely, 0*929. 

The relative efficiency of the estimates of Example 6.22 cannot easily be 
determined exactly, owing to the variation in the numbers of units from occasion 
to occasion. With the average value of ^ of ^ and a correlation of 0*811, the 
efficiency of yu relative to the overall mean after a number of occasions will 
be 1*32 (Table 8.8.a) and that of the estimate y h — y h - 1 of change relative 
to the difference of the overall means will be about 1*6 (Table 8.8.b). 

8.9 Sampling with probability proportional to size of unit 

The relative precision of sampling with uniform probability and with 
probability proportional to size of unit depends on the variance laws to which 
the material is subject. The case in which the mean r for fixed x is the same 
for all values of x, and in which the variance of r for fixed x is a function of x , 
may first be considered. 

If the total size of all units is known, we shall be concerned with estimates 
of r. If we put Y ( x)jx 2 = y we have the results shown in Table 8.9.a for 
the three variance laws there given, v being a constant. 


Table 8.9.a—V ariances of r 


Variance of r 
for fixed x 

Variance of r 

Uniform 

probability 

Probability 
proportional to x 

V 

v (i + y)/ n 

v/n 

vjx 

v/nx 

v/nx 

vfx^ 

v/nx 2 

v (1 + y)/wx 2 


In sampling for yield per acre in a crop estimation scheme, for example, 
the variance of the yield per acre may be expected to be about the same for 
large and small fields. If in addition there is no marked difference between 
the mean yields per acre of small and large fields, the precision of sampling 
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with probability proportional to size relative to sampling with uniform 
probability will be 1 4- y. 

If the mean r for fixed * varies with * the variances of Table 8.9.a will 
be increased, and the precision of either method, or the relative precision of 
the two methods, may best be judged by direct analysis of actual data. 

If the acreage of the crop has to be determined by the sampling of fields, 
the relative precision of sampling with probability proportional to size, and 
with uniform probability, will also depend on the variance of the acreages. 
The simplest case is that in which the sampling is used to determine which of 
the fields carry the given crop, and in which the values of x and V (x) are the 
same for the fields of the given crop and for the remaining fields, the number 
of fields being large. The variance of the proportion p of the total area under 
the given crop when n' fields are taken is in this case pq /«' with sampling with 
probability proportional to size, and pq (1 + y )j n ’ with uniform probability. 
The relative precision is therefore 1 + y. 

In the case of sampling with probability proportional to size, point 
sampling will often be used. If the part of the land area which consists of 
fields cannot be recognized on the map, additional points will have to be visited 
on the ground, and these must be allowed for in assessing the total number of 
points required. 

In the more complicated cases of sampling with probability proportional to 
size the same general approach as that adopted in the previous sections must 
be followed, using the data provided by an actual sample to determine the 
relevant variances. If the basic data are derived from a sample taken with 
probability proportional to size, s r 2 can be calculated from the formulae of 
Sections 7.15 or 7.16. The value so obtained may then be used to deduce 
the size of sample required for a given accuracy. 

If the basic data are derived from a sample taken with uniform probability 
of selection, or if data relating to the whole population are available, the various 
sizes of unit will occur in proportions which are different from those of a sample 
taken with probability proportional to size of unit. Consequently a different 
formula is required for the calculation of s,- 2 . The appropriate formula for a 
random sample is 



1 ) x 



S(x) J 


i 

(n — 1) je 


I s (ry) - fu S'(y)] 


where r« S ( y)/S (x). If the individual values of y and r are tabulated the 
second form of the expression is most convenient for computation. 

In the case of a stratified sample the expression within the square brackets 
must be evaluated for each stratum separately. If the number in each stratum 
is small and there is no great difference between the xi, the separate components 
can then be aggregated and divided by (n - t) x. If there are considerable 
differences between the Xi it is best to calculate sn 2 separately for each stratum 
using the separate values in the calculation of V (Y). 
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Example 8.9.a 

From the data? of Table 6.19.a construct a frequency distribution of the 
acreages of sugar-beet fields on old arable land in Norfolk, and hence calculate 
the relative precision of estimates of the mean yield per acre derived from a 
random sample of fields taken (a) with probability proportional to size and 
(b) with uniform probability, on the assumption that the variability of the 
yield per acre is the same for all sizes of field. 

In constructing the frequency distribution, account must be taken of the 
variable sampling fractions at the two stages of sampling. Since the raising 
factors at the first stage are nearly proportional to 7, 4, 2, the fields on the small, 
medium and large farms with a single field of sugar-beet must be counted 7, 
4 and 2 times respectively. Similarly a field occurring on a farm with 2 fields 
of sugar beet must be counted 14, 8 or 4 times, etc. 

This procedure gives the frequency distribution shown in Table 8.9.b. 

Table 8.9.b— Frequency distribution of the acreages of 

SUGAR-BEET FIELDS 


Acreage 

Raised No. 
of fields 

Acreage 

Raised No. 
of fields 

2 

92 

12 

4 

3 

43 

13 

8 

4 

117 

— 


5 

48 

20 

8 

6 

54 

— 


7 

63 

24 

6 

8 

42 

— 


9 

4 

2a 

12 

10 

60 

— 


11 

24 

48 

2 




587 


Following the method of Example 7.1.a for grouped data (the acreages 
being taken as the working units), we find 

x = 6*681, s 2 = V (x) = 30*47, y = 30-47/6-681 2 = 0*681 

Consequently the relative precision of methods {a) and (b) is 1*68. 

Example 8.9,b 

- From the data of the sample of Hertfordshire parishes taken with uniform 
probability (Sample A of Section 3.11) estimate the value of Sr 2 for a sample 
of parishes, stratified by districts, taken with probability proportional to size. 
Make a similar estimate from the data for all 91 combined parishes. 


264 



EFFICIENCY 


SECT. 8.9 


The data for the 91 combined parishes are shown in Table 8.9.c, the 
parishes selected for samples A and B being indicated in the table. 

Table 8.9.c—A creages of crops and grass (divided by 10), and of wheat, 
IN THE 91 COMBINED HERTFORDSHIRE PARISHES 


C. 6c G. 

Wh. 

Dist. 

C. 6c G. 

Wh. 

Dist. 

C. & G. 

Wh. 

Dist. C. 8c G. 

Wh. 

249 

316a 

3 

264 

386a 

4 

363 

958a 

5 380 

491 

335 

1646 


208 

366 


220 

454 

347 

8186 

664 

652 


237 

3116 


390 

907 

363 

741 

226 

192 


227 

319 


251 

466 

405 

582 

256 

272 


220 

238a 


210 

426 

337 

586 

314 

131 


436 

54 


217 

263 

371 

442 

248 

26 


214 

327 


305 

779 

294 

416a 




333 

2286 


230 

5586 



283 

612 


464 

1074 


227 

440a 

6 252 

2256 

247 

624 


232 

313 


337 

618 

307 

284a 

205 

356a 


210 

98a 


282 

710 

374 

7386 

304 

7666 


201 

407 


443 

7756 

305 

244 

220 

362 


265 

466 


250 

518a 

486 

562 

344 

7016 


634 

1264 


289 

495a6 

204 

194 

237 

567 


228 

276 


213 

262 

257 

236a 

204 

5036 


229 

249a6 


242 

565a6 

249 

309 

209 

573 


293 

6866 


416 

8626 

337 

294 

336 

728a 


276 

651 


340 

537 

323 

246 

305 

901 


281 

503 


358 

1085 

350 

390 

330 

515a 


273 

604 


246 

474 



344 

788 





‘393 

776 

7 306 

2906 

226 

434 





267 

702 

380 

244 

220 

506 





259 

410 

272 

237 

345 

838 





388 

862 

384 

318a 







258 

424 

251 

116 


27,304 44,676 

The parishes selected for samples A and B of Table 3.11. b are indicated by 
the letters a and 6 respectively. 


The values of x y y, and r for district 4. (sample A) are as follows: 


X 

y 

r 

363 

958 

2-6391 

227 

440 

1-9383 

250 

518 

2*0720 

289 

495 

1-7128 

242 

565 

2*3347 

1371 

2976 

2-1707 


Thus we have S (ry) — r u S ( y) = 958 x 2-6391 + . . . — 2976 x 2*1707 
= 161*3. The corresponding values for districts 2, 3 and 6 are 63-9, ll6*8, 
and 0*0, with a sum of 342-0. The sample mean of x for these four districts 
is 266*4 and consequently s r 2 — 342*0/(10 x 266*4) = 0*1284, or in acreage 
units 0*001284. 
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This value is considerably less than the value 0*002649 obtained in Example 
7.16. Each estimate, however, is based on only 10 degrees of freedom, so 
that the discrepancy is not exceptionally large. The corresponding value from 
the data for all 91 combined parishes, calculated in the same manner, is 
0*002222. This calculation is left as an exercise for the reader. 

Example 8.9,c 

Compare the relative precision, in the estimation of wheat acreage, of 
samples of Hertfordshire parishes taken with uniform probability and with 
probability proportional to size, by calculating the expected standard errors of 
samples of types A and B of Table 3.11.b. 

The data for all 91 combined parishes give a value of s q 2 of 23,483 when 
districts are eliminated and the same ratio is taken for all districts, and a value 
of 22,427 when different ratios are taken for the different districts. 

In calculating the expected standard error the formula of Section 7.10 
may be used, so as to allow for the variation in sampling fraction from district 
to district. The factors Xi 2 j{Si (#)} 2 may be replaced by \j f? since we are 
considering the average error to be expected over a series of similar samples. 
This will lead to a slight underestimation of the average error. 

We find £ (1 — ft) mj fi 2, = 402*83, and consequently V (Y) = 9*460 x 10 6 
when the same ratio is taken for all districts, and 9*034 X 10 6 when different 
ratios are taken. 

Similarly, in the case of sampling with probability proportional to size, 
from the results already given in Example 7.16, and the value of s r 2 given in 
Example 8.9.b, we find V (Y) = 8*263 X 10 6 . 

The standard errors corresponding to these variances have already been 
given in Table 3.11.b. 

The relative precision of sampling with probability proportional to size, 
and with uniform probability using a single value of the ratio, is therefore 
9*460/8*263 = T14. There is thus a gain in precision of 14 per cent., but 
it must be recognized that sampling with probability proportional to size will 
result in parishes of larger average size being included in the sample. Neglecting 
the disturbance due to the probability being only approximately proportional 
to size, the average size of parish in this case will be given by S ( x 2 )/S («v), where 
the summations are taken over the whole population (or a sample selected with 
uniform probability). This gives an average size of 3244 acres of crops and 
grass, compared with the arithmetic mean of 3000 acres, i,e. an average size 
greater by 8 per cent. 

8.10 Interpretation of the analysis of variance 

The analysis of variance can be interpreted in the manner set out below. 
This interpretation is of particular use when we are concerned with multi-stage 
sampling, and with the effect of change of size of the sampling units. 


/ 
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If the units fall into groups of any kind, such as strata, the unit values of 
a variate y can be regarded as made up of the sum of two parts, one, u, which 
varies from group to group but has a fixed value for all units of a particular 
group, and the other, v, which varies from unit to unit independently of the 
groups. The variances of u and v may be denoted by U and V respectively. 
1 hus u and v may be random sample values from normal distributions, though 
the condition of normality is not necessary. In this hypothetical framework 
zero mean can be assigned to the parent distribution of v without loss of 
generality, but even so the mean of the t/s for all the units of a finite population, 
or for all the units of a particular group, will not be exactly zero, and consequently 
the group means are not exactly equal to the w’s. For this reason the values of 
u and v cannot be uniquely determined from the values of y. 

J he , m ® an squares of the analysis of variance provide estimates of U and V. 
It A and B are the mean squares between and within groups, C is the overall 
mean square k is the number of units in each group, and h the number of 
groups, we have 

A = AU + V 

£ = V 

Hence 

u = (A - B)/k 

We also have, from the analysis of variance, ( hk — 1) C == h (k — 1) B 
+ (A - 1) A. Consequently if a* is the overall variance and Bl * the variance 
within groups we have, from formula 8.3, 






The factor (A 1 )/h is analogous to the correction for sampling from a finite 

population. 

The relative precision of stratified and random sampling will be obtained 
by taking the groups as strata. We then have, with t strata, 

i! _ i 1 ~ 1 U 
V ~ + t ' V 

An alternative formulation is possible in terms of the intra-class correlation 
t.e. the correlation between members of the same stratum when the strata 
themselves are regarded as a random sample from an infinite set of similar 
strata (R. A. Fisher, Statistical Methods for Research Workers, Section 40). 
The estimate ti of this correlation is given by 

A ~ B U 

Ti 


and consequently 


'A + (* - 1) B “ U 


= 1 


t ~ 1 

t * F 


Ti 


Looked at from this point of view, the intra-class correlation coefficient 
may be regarded as a quantitative expression of association which is alternative 
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to the ratio U/V. In this book we shall use the concept of additive components 
of variance, since this appears to be more easily capable of generalization, and 
is otherwise preferable to the concept of intra-class correlation. 

When there is compensation between the different units of the same stratum 
the definition of U as a variance breaks down, and has to be extended (see 
Yates and Zacopanay, 1935, H). Complete compensation occurs when all 
the strata means (or the first-stage units in a two-stage scheme) are equal. 
In this case K U + V = 0, i.e. U = - V/K, where K is the number of unite 
in each group of the population. Negative values of U between 0 and — V/K 
are therefore admissible. 


8.11 Multi-stage sampling 

The sampling variance of two-stage sampling can be divided into two parts, 
A and B , where 

A = variance due to the first-stage sampling when there is complete 
ascertainment at the second stage, i.e. when all the second-stage units 
which go to make up the selected first-stage units are known, 

B = variance due to the second-stage sampling of the selected first-stage 
units. 

Thus the formula of Section 7.17 for V (y) in two-stage random sampling 
may be rewritten 

i-/' , i-/" 


V (y) = 


! + 


o"2 


(8.11.a) 


where 

, 0 '*=5'*-h=T^5"* (8.11-b) 

The first term constitutes part A and the second part B. 

The second term will be recognized as (1 —/")/(l /) times the variance 

that would be obtained with single-stage sampling of the second-stage units, 
the same total number of second-stage units being taken, with the first-stage 
units as strata and uniform sampling fraction/. If/" is small, therefore, the 
first term gives the increase in variance due to the adoption of the two-stage 

process. . 

The above subdivision is alternative to that given in Section 7.17. Fart A 
is dependent only on the first-stage sampling, being unaffected by the intensity 
or type of sampling at the second stage. This fact considerably simplifies the 
problem of determining the sampling errors for different intensities of sampling 
at the two stages : with the subdivision of Section 7.17 the variation in ^ 2 
for different intensities of sampling at the second stage has to be taken into 
account. 

The only new point that arises in the estimation of the relevant variances is 
the determination of part A from the data of a two-stage sample. In general 
this simply requires that the variance per first-stage unit due to the second-stage 
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sampling of the first-stage units be deducted from the variance per first-stage 
unit calculated from thf sample. Thus for a two-stage random sample formula 
8.11.b is used. 

It is often helpful to carry out an analysis of variance on data derived from 
a two-stage sampling process. The situation is simplest when the number 
of sampled second-stage units n" in each first-stage unit is the same. Each 
stage of the analysis then follows the same pattern as the analysis of a single- 
stage sample of the same type. At the first stage, however, the values entering 
into the analysis must be either the means or the totals of the second-stage 
unit values. It is customary (though not essential) to tabulate the sums of 
squares of the first stage in terms of the second-stage units. If the first-stage 
unit means are used, therefore, all sums of squares at the first stage must be 
multiplied by while with totals all sums of squares must be divided by n 

In the case of two-stage random sampling, for example, the degrees of 
freedom and mean squares will be 

Between first-stage units 
\ Within first-stage units between 
second-stage units 


Total . 

We then have 

1 — f ' 

V (y) =-AU 

where ti is the total number of second-stage units and f is the overall sampling 
fraction (n = n' n " and / = /'/")• The second term of this subdivision is 
the estimate of the variance that would be obtained with single-stage sampling 
of the second-stage units, the same total number of sampling units being taken, 
with first-stage units as strata, and uniform sampling fraction. The analysis 
of variance therefore provides a further alternative subdivision of the Rampling 
variance. 

The results are similar with stratification with uniform sampling fraction 
at either or both stages. 

When one or both the sampling fractions are variable, or when the numbers 
of second-stage units in the different first-stage units are unequal, the analysis 
of variance becomes more complicated and the direct approach is often simplest. 
With moderate inequality in the n" the analysis of variance of the first-stage 
units may be carried out on the means, with multiplication of the mean squares 
by n ", or better by the harmonic mean of the w", i.e. the reciprocal of the mean 
of the reciprocals. 

Alternatively the whole analysis may be carried out in terms of the second- 
stage units. In this esse both the means (in terms of the second-stage units) 
and totals of the first-stage units are tabulated, the sums of squares being obtained 


Degrees 
of freedom 


Mean square 


«'(»"-!) s" 2 =V 


+ -— 


(8.11 .c) 
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by the “ mean X total ” rule, i.e. every mean is multiplied by the corresponding 

/; t ^ le second method is used n" can be replaced by n" in the expression 
w'Y 2 = V + n" U given above for the mean square for first-stage units of 
a random sample, or better by « 0 " where 

'< = { s («") - 5 (n“*)/S (»") }/(*' _ 1 ) (8. 11 . d) 

With a stratified sample a value for « 0 " is calculated for each stratum and a 
weighted mean taken, weighting by the degrees of freedom contributed to the 
within-strata sum of squares (Cochran, 1939, A). 

These alternative methods of analysis are not exactly equivalent, but we 
cannot discuss their differences here, beyond stating that the first method is 
generally best when all the first-stage units are of approximately the same 
size and the variation in the numbers of second-stage units per first-stage unit 
is due to extraneous causes, whereas the second method is likely to be preferable 
when the first-stage units vary greatly in size and the number of second-stage 
units per first-stage unit is about proportional to this size. 

The above methods can easily be extended to multi-stage sampling with 
more than two stages. 

Example 8.11. a 

Calculate the expected sampling errors of the wheat acreages derived from 
the two-stage sample B x of Hertfordshire farms of Table 3.11.b, and discuss 
the effects of varying the number of parishes in the sample, with adjustment 
of the second-stage sampling fraction so as to give the same total number 
of farms in the sample. 

Part A of the variance has already been determined in Example 8.9.c. 
We have A = 8*263 x 10 6 . 

The determination of part B requires the evaluation of the variance of the 
r for individual parishes due to the second-stage sampling of these parishes. 
These variances were evaluated separately for each of the 17 parishes of the 
sample, using method (a) of Section 7.9. The mean value of these variances 
V"(r) was found to be 0*003575. 

The equation of estimation of the total acreage is Y = E X* r*. The 
second-stage variance of Pi is V"(r)/«/, and part B of the variance is therefore 
given by 

B = V"(r) EX/ 2 /**/ == 0*003575 x 45*429 x 10 8 = 16*24 x 10 6 
Hence V (Y) = 24*50 x 10 6 . 

Exact treatment of the effects of varying the number of parishes is complicated 
by the fact that the first-stage sampling fractions are bound to vary somewhat 
from district to district, and that the number of farms per parish is also variable. 
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With 9 points there is an increase of 27 per cent, in the variance and with 
16 points an increase of 15 per cent. In the latter case about one-seventh 
more areas will be required for the same accuracy. Against this must be set 
the fact that only fields in which the points fall need be examined and recorded. 
The occurrence of mixed crops and systematic location of the points on a 
rectangular grid will also reduce the sampling variance. 

8.12 An example of a pilot sampling scheme for crop estimation 

In order to investigate the practicability of obtaining estimates of the yields 
of cereal crops in the United Kingdom by the harvesting of sample areas, the 
yields of a number of wheat fields were determined by this method in each of 
the years 1934-1938 (Cochran, 1939, A). Fields were taken in several districts 
each year, one or two fields being selected at random from the fields growing 
wheat on each chosen farm. The selection of farms in each district was not 
random, the farms being taken in the neighbourhood of the centres at which 
the investigators were located. 

The sampling of the individual fields followed the lines described in 
Section 4.29, the fields being traversed in the direction of the rows, along two 
lines selected at random. Two sets of unit areas were taken from each line. 
Each unit area consisted of \ metre of each of 6 contiguous rows. For the 
most part, sets each contained three unit areas, equally spaced along the line, 
with a random starting point. 

The yields of grain obtained in 1937 are shown in Table 8.12.a. The 
mean yield of all the unit areas in each set is given. In order to allow for 
differences in row spacing on the different fields the yields have been reduced 
to a 6-inch row spacing, and therefore represent the yields in grams of areas 
of l metre x 3 ft. Fields on the same farm are indicated by brackets. In 
District III, where three fields from a single farm were sampled, each field 
was growing two varieties which were sampled separately. 

The analysis of variance was carried out in units of the totals of the four 
sets, i.e. on yields of areas of 1 metre x 3 ft. or 0-000226 acres. Thus the 
sum of squares of the sets is multiplied by 4, and the sum of squares of the 
line totals by 2. 

The sums of squares for 1937 can be obtained from Table 8.12.a by 
calculating the sum of squares for each classification, disregarding the others, 
and deducting the sum of squares corresponding to the next higher classification. 
The rule of “ mean x total ” or “ total 2 /(number of units) ” is followed in 
each case. Thus the correction for the mean is ll,706 2 /39 = 3,513,601. The 
sum of squares for districts is 

i (1098)2 + | (909) 2 + . . . - 3,513,601 - 54,224 
The sum, of squares for farms is 

1 (422) 2 + 4 (676) 2 + 4 (619) 2 + 290 2 + 4 (2053) 2 + . . . 

- 3,513,601 - 54,224 = 132,062 
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with one unit per stratum, however, an objective estimate of error is possible 
if additional randomly located units are taken in certain of the strata but 

1948 S Sampl t mU< ? m ° re eIaborate methods have to be used (Yates 

1948 A), and even then the estimates obtained are not fully objective 

of be ™ ted her f that the common practice of estimating the error 

a sample with one unit per stratum by combining the strata uf pairs will 
give an estimate of error which will generally be somewhat greate/than the 
true error with strata of double the size and" two units per Saturn 

An example of the relation between the accuracy of sampling with two 

° ne U f n e y. stratum ’ and ^tematic sampling, 
is given in big. 8.15. The curves (full lines) are based on the variances 

f “ItL „4“ J Th7clT re l “■ 1 f0 °* de ? th ' “ Ch constituting 

sampling unit The cost scale is proportional to the number of units and 
the accuracy scale gives the accuracy of the sample estimate of the mean soil 
temperature over a period. The curves themselves are based on relations for 

° f the , yP e giv r en abOTe - The curve of losses due to errors and the broken 
curves will be referred to in Section 8.18 k 

The material is of the type in which the reduction in variance with reduction 

Jhe SI cnr° f ™ b . e ex P ected to be considerable. This is brought out by 
the curves. The relative precisions of the three types of sampling 8 are given 
by the intercepts of horizontal lines, which are in the ratio 1:1-75 -4^4 
efficl . enCI . eS areglven b y reciprocals of the intercepts of vertical 

he mieH m I" 3 ' 10 1 : 1 ; 36 = 2 ‘ 22 ‘ This Provides an illustration ol 
the marked difference between relative precision and relative efficiency when 

reduction m the size of the strata results in a considerable reduction in variance 


8.16 Efficiency in terms of cost 

In the previous sections we have described how to determine the size of 
sample necessary to attain results of a given accuracy when various methods 

KrTS "17,7 > T “"i 7 dicated *»» relative^fficiencyjfn 

terms of numbers of sampling units) of different sampling methods and 
variations in a given method may be judged. Minimization of the number of 
sampling units or amount of material included in the sample will not ffigeneral 
however, give maximum efficiency in terms of cost Tn ^ 7. g ’ 
method must be so cho.n 

To minimize the total cost it is necessary to know the relative costsTf the 
different operations.. Exact evaluation of these costs is usually^troublesome 
and is only worth, while if an extensive survey has to be inckmken o"’ 
if d series of surveys on similar material is contemplated The ™tt«- • 
competed by the &c «... for m , ny pllrposes 

thdi “ rit'Sc 7° 7 ,ve " se "f per uni< ,hat “ re ’ uired n “»- 

tneless it is not difficult in the course of survey operations, or even in the 
course of a phot survey, to obtaiu data which will serve to give rough estimates 
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of the main components of the costs. With the aid of such estimates the 
efficiency of further surveys of similar material can often be substantia y . 

im When information on the costs of different types of operation is available 
it is possible to determine the values of the sampling fractions, etc., which 
for a given sampling method will give results of the required accuracy for the 
least cost Such values may be termed the optimal values. It is also possible 
^miiTwhich of two methods, each employed in the most efficient 

^ The determination ^optimal" values of the sampling fractions, etc. requires 
minimization of the cost function, and will be dealt with in the next section. 
The choice between different methods when the optimal values of the samp ing 
fractions, etc., are known, or when there are no variants of this type, can be 
obtained directly from the results of the previous sections. 

Thus in the case in which there is the possibility of using supp emen ay 
information, if c s represents the cost per unit of obtaining the supplements! y 
information c 0 the marginal cost per unit when no supplementary information 
;“d(2£ costs Ling token ,» include the marginal com of abstract™, 
and computation), and C\ represents the additional computational cost of 
utilizing the supplementary information (which apart from the above margina 
cost pe? unit may be taken as broadly independent of the size of the sample), 
the total cost of a sample of n s units with supplementary information, excluding 
elements of cost which are fixed for both methods, will be 

Cs = Ci d - Ws (Co Cs) 

and that for a sample of n 0 units without supplementary information will be 

Co = n 0 Co 

Under conditions in which the error variance is inversely proportional 
to the number of units in the sample, the two samples will be of equal accuracy 
when the numbers of units are in inverse ratio to the relative P re “ sl ° n of J h 
two methods with equal numbers. If the regression method of adjustment 
used, therefore, and the sample is random, 

ns/no = 1 — p 2 

where p is the true correlation coefficient between the main and the 

supplementary variates (estimate r). oCGni&nt if 

Hence the use of supplementary information will be more efficient it 

no Co > C x + no (1 p 2 )(^° ~b ^ s ) 

i.e. if N , n 

n 0 (Co + cs) p 2 ^ n 0 c s + C 1 

If the cost of adjustment C\ can be ignored this inequality becomes 

CsICo < p 2 /(l — P 2 ) 

which is independent of *» and therefore of the accuracy required. Thus 
for example, under these conditions, if p = 2 the use of supp emen y 
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costs is considerably less than l^Whh aTatiolffth^ ^ ratio of the 
atio of 7 : 8 (minimum value, with zero value J ? ^ C ° StS wiH be in the 
higher values of p the gains are more marked wk C0St f tio > 3 ; *)• With 
have equal cost when c s /c 0 = 9 / 7 . In Sis ' se U T 1 the tW0 metho * 

, 2 and * the ratio of the total costs will h?S’/oo” the rati ° has the values 
(minimum value 7 :16). tS W1 “ be 21 : 32 and 35 : 64 respectively 


8-17 Minimization of the cost function 


-^unction 

determined byTSSccumcy'required'inthe ° f SampIin S are not fully 

d — ^ mi -Wthe\ost fonctn ’ ** Values can be 

of sampling unhsT', l^ 7 ^thT ^■“ * linear function of the numbers 
approximation, using marginal‘costs^ Th™ eta ’ at least to a first 
a multiple * of tto f lmpIest P-cedure is then to add 

for the variance of the reonirpd *• ^ ex P ression m terms of n,, n„ 
expression with respect to n lt « 2 , . . Tn’Sm’ TV diffe 5 entiate th e resultant 
fixed cost, which is equivalent to min 1- , ThlS mm)mizes the variance for 
exact procedure will be apparent from theVrst Tth™ Variance ' The 
. , Tr " rst of the oases treated below. 

(a) Variable sampling fraction 

a.S&STLx? Ah Mra,mn is * 

Hence C = S a m 

V(Y) = 2 0/ * (I-/,) n . 3/k . 

= S 0,2 (V** ~ 1/Nt) N,'2 4- K(Y.„„. ™ ( 8 • !7• a) 


Diffe *• • = 20)2 W* - W N,-* + K (S __ C) (8 ‘ 17 - a ) 

t equations" t0 the and equating to zero, we have the 

Hence, since «,/N, =/, ~ ^ N;2/w ' 2 + ** = 0 

_A__ = / 2 _ i 

Tb„ ct 2/a/c 2 ~~ ‘ ’' ^ Z/K (8.17.b) 

/SJ 0 ”'*» ”‘IVa. This i, » 

»lv.ng for K. This W SUbsMu “8 the /, in equation S.I?/,™ 
WK) S N/», Vc, = V (Y) + £ N . 
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If »y of fh. ft at. -P- » -«y *• ' 

„ i .a nf the eauation. 


(b) Two-phase sampling first-phase information 

If c . represents the cost per unit of obtain J information, and 

c 2 l additional cost per unit of obtaining th -nd p ^ ^ ^ ^ 

r«s -*- * 

C '■ — 1 ^2 ^2 ^ • 

When the methods of sa “ p ^| ‘(^parTfrom corrections for finite 

variances of the rf units> from the results 

sampling) are inversely proportional to 
of Section 8.7 we have 

1 —h 


v (y) 


n, 


' + 


I(l-3 

«2 V n l 


<V 


(8.17. c) 


(8.17. d) 


Following the above procedure, we find 

« 2 2 fi g s 2 - fl . 

- • • a a — o 2 2 c 2 1 

and « 2 required for a given accuracy can 
where , he values U1 and 2 - • ™ '«* 


Wr 2 c 2 °r 

where k = <hI°v The values of** " in y (y). 

be obtained by substituting for » 2 m terms ot r 

( C ) Two-phase point sampling estimati on (Examples 

We will only consider the special case ansi g 

6.16.b and 7.15). . f ' points and the yields per acre are 

If the acreages are determined fi- 0 - n which n 0 f these points fall, 

determined on fields of the crop determine the nature of the crop, 

and if ✓ is the cost of have, when a proportion 

Jop and the — P« acre - *■ 

Hence ^ p y ( r ) c ’ 


The values of n 0 ' and , are test obtlined by substitution for . in terms of 
Hq in the equation for V (Y). 

unit, the total cost is given by_, , .. 


q r 2 c 


(8.17.f) 
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, With , a rand u 0m or random sample with uniform sampling fraction 
forliukl. 1 Tc; erS T °^ ec0nd - stage units P er first-stage unit, V(y) is given by 

, ^ (/) = U/»' -f- V/« + const., 

where 

U = - o"«/N", and V = <i" 2 . 

Following the previous procedure, we find 

« 2 /«' 2 = «" 2 = V c'/U c" (8.17. g) 

In other words the number of second-stage units per first-stage unit is 
independent of the accuracy required. The values of ri and n required for 
any given accuracy can be obtained by substitution in the equation for V (y). 

1 he same formulas hold for any form of two-stage sampling in which V (?) 
can be written m the above form. Thus stratification with a variable sampling 

raction at the second stage is covered, provided none of the sampling fractions 
are unity. ° 

(<?) Two-stage sampling with probability proportional to size at the first stage 

The solution of the case of sampling from within strata with probability 
proportional to size of unit at the first stage follows similar lines. We find 
that if the cost per second-stage unit is the same for all first-stage units in all 
strata, one condition for minimum cost is that the second-stage sampling 
factions are so chosen that the overall sampling fraction is uniform. Thus 
the use of a uniform overall sampling fraction, which is computationally con¬ 
venient, is justified on grounds of minimum cost. The assumption of constant 
cost per second-stage unit will not in fact hold for the component of cost due 
to travel, since the same number of second-stage units will be taken from any 
selected unit of a stratum, and consequently the travel cost per unit will be 
greater for the larger units. This, however, is not likely to reduce the efficiency 
greatly unless travel costs at the second stage are very large 

The relation between the first-stage and second-stage sampling can also be 
very simply expressed. In the case considered in Section 8.13, in which the 
S f e ..°;, the first ' sta g e units is represented by the number of second-stage units, 
ir an tne second-stage variances are equal we may put ar n - 0 ' 2 _a" 2 N//N* = U/ 

and <r" 2 = V, the costs per first and second-stage unit being taken as a' and c"‘ 
We then have 

, V_N+ V(V/ C ") S N« V (l W) 

J V (Y) + 2 N,- 2 Ui/N/ 

k/ 2 =/ 2 N , 2 U; c"j\ a' 

Since the number of second-stage units tu" per first-stage unit in stratum i 
the same for all selected units, and independent of which particular units 
are selected, the above equations give »;" 2 = V Cf'/U, c", as before. 
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{/) Two-stage sampling of farms and fields 

Any fully general treatment is difficult owing to the fact that the numbers 
of fields per farm carrying a given crop are usually small, and consequently 
what would otherwise be the optimal values of the second-stage sampling 
fractions will give numbers of fields per farm which are not only non-mtegral, 
but which will in many cases be less than unity. 

If the numbers of fields per farm are sufficiently large for this source ot 
disturbance to be neglected, the optimal values of the sampling fractions can 
be simply expressed. 

In order to standardize the notation we may replace the between-larms 
component of variance U 2 , as defined in Section 8.12, by U', and the between- 
fields within-farms component U/ by U". The cost of visiting a farm may be 
taken as c', and that of sampling a field as c ". 

We will consider the case in which the farms are divided into size-groups 
with fields which have mean areas a v a 2t .... We will further assume that 
the mean acreages per field within a size-group of farms of 1, 2, 3, . . . e s 
are the same, and that V («/)/««* is constant for all size-groups and for all 

numbers of fields within a size-group. 

In the first place we find that in this case the second-stage sampling fractions 
within a size-group should all have the same value, which is given by 


//" 2 


__ U" c' N/' 
“ W c" [N/] 2 


(8.17.h) 


where N;/, N;,', Ni 3 ', ... are the numbers.of farms in the group with 1 , 2, 
3, . . . fields respectively, and [N;'] 2 = N (1 ' + 4Ni 2 ' + 9 Nj 3 + • • • The 
ratios of the first-stage sampling fractions are given by 


x 19, 4 f 2 

J 1 __ = - ll ...= H say (8.17. i) 

«i 2 [N/L/N/ « 2 2 [N 2 '] 2 /N 2 ' 

These equations will serve to give first approximations to the relative 
sampling fractions. The relative efficiency of different variants which are 
practically applicable can then be tested by use of the expression for t e 
variance of the weighted mean given in Section 8.12. It will usually be sufficient 
to use the mean acreage for each size-group in evaluating the weights, but in 
evaluating the actual size of sample required the factors a? should be replaced 
by «i 2 + V (at). 


Example 8.17.a 

Determine, from the data of Examples 6.12.b and T.12.b, the optimal 
proportion of sample plots to eye estimates on conifer stands in a two-phase 
sampling scheme in which eye estimates only are made at the first phase, when 
the cost of visiting a stand and making an eye estimate is tjj the additional 
cost of measuring a sample plot. 
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Consideration of this problem would be relevant if it were possible to 
demarcate and classify the stands into conifers of over 20 years of age etc 

ombahd-r ph 0 t °? ra P^ In this «■* if the sampling of stands Is’ Sh 
probability proportional to size, the variances of the volumes per acre, given 

^ *• ^ 

s i 2 = 4803 s z = 3579 

k 2 = 0-745 *3/(1 - * 2 ) = 2-92 cjc a = 1/10 

« 2 /»i = a/ (2*92/10) = 0-540 

Sted SampIe Pl0tS Sh ° U,d bC taken ° n ab ° Ut one ' haIf the stands which' are 

samllW 1 S ’ h ”T eVe "’ in this case no appreciable gain by the use of two-phase 
mphng. If n 2 is the number of sample plots required if no eye estimates 
are made we have, for equal variance, 


which gives 


: n 2/' n i + (1 —■ njnj) k 2 

-- 0-540 -f 0-460 X 0-745 = 0-883 
: 1-64 


Thus with two-phase sampling 


Cjn,' Cj 


1-64+ 10 x 0-883 


and with single-phase sampling C/n^ c t lies between 10 and 11 , depending on 
visited ™ 8 ^ t0 thC ° mission of the ^ estimates on the stands that are 

. j f the , reasons wh y the use of eye estimates is here of little value is 

that the determination of the volumes of individual stands by means of a single 
sample plot per stand is very inaccurate. If more sample plots per stand were 
taken the overall variance of y would be reduced, while the covariance of y 
and * and the variance of x would remain unaltered. Under these circumstances 
two-phase sampling would be more advantageous. Given information on the 
withm-stand variance of the sample plots and the cost of taking different 
numbers of sample plots from a stand, the optimal number of sample plots 
per stand could be determined. r 

Example 8.17 .b 

If in the crop survey of Examples 6.16.b and 7.15 the additional cost of 
crop-cutting in order to obtain an estimate of yield is 20 times the cost of 
visiting a sampling point to ascertain the nature of the crop, calculate the optimal 
ratio of the number of yield determinations to total sample points, and the 
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number of points required to give an estimate of the total yield with a standard 
error of 5 per cent. 

From the results already given we have 

p = 2,202/33,255 = 0-0662 q = 0-9338 
r = 15-7 V (r) = 3-5 2 V (r)/r 2 = 0-0497 

Hence, from equation 8.17.f, 

'•0662 X -0497 


ft /-06 


= 0-0133 


•9338 X 20 

Substituting in equation 8.17.e, 

(-05) 2 n 0 ' = -9338/-0662 + -0497/-0133 
no = 7140 n = 95 n' — 473 

The large number of points that have to be visited to ascertain the crop is 
accounted for by the small fraction of the total land area under crop. If the 
crop is an important one it will occupy a considerably larger fraction of the 
cultivated area, and if, therefore, the non-cultivated areas can be excluded, 
the total number of points required will be considerably reduced. Alterna ive y 
sparsely cultivated areas may be sampled with a lower intensity. 

P It is also worth noting that if several crops have to be surveyed it will 
probably be possible to make the acreage determinations of all crops 
simultaneously prior to the crop-cutting work. This will alter the above cost 
relationships. The general case can be dealt with by minimization of the 
combined cost function. In the simple case m which there are a number o 
crops each occupying the same area and having the same variance and cost 
relationships, and in which the same accuracy is required for each crop, the 
above solution holds, the cost of the acreage determinations being spread 
equally over all the crops, and adjustment of c being made for the cost o 
revisits. Thus in the above example, with 5 crops and a cost of revisit per 
point of double the original cost (owing to wider dispersion), all that is necessary 
is to put cjc' = 110. We then find = 9160, n — 52, n — 606. 

Example 8.17 .c 

If county lists of farms are not available, and if the cost of the construction 
of a list of the farms of a parish is 10 times the cost of visiting a single farm 
within the parish and ascertaining the wheat acreage, determine the optima 
sampling fractions at the first and second stage which will give estimates o 
the acreage of wheat having a standard error of ± 4500 approximately 
10 per cent.), using the methods of sampling followed in samples B, an * 
of Hertfordshire farms. 
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The same general conclusions hold when the variance-cost relation is of 
a more complicated form than that given above. A case of this type is 
illustrated in Fig. 8.15, in which a loss curve of the form « V (Y) has been 
inserted. We see that with the more accurate methods of sampling the minima 
of the cost-plus-loss functions (shown by broken lines) are attained when both 
the cost of the sampling and the loss due to errors are less than with the less 

accurate methods. . , 

Other loss functions will lead to more complicated expressions lor the 
average loss. The most general loss function which is capable of relatively 
simple expression in terms of V(Y) is that in which the loss due to a positive 
error is equal to a Z b , and. that due to a negative error is a' (— Z) b , a, a and b 
being constants. Provided the distribution of errors has the same form for all 
values of V (Y), the average loss is then equal to a" <s 0 b , where a 0 2 = V (Y) 
and a" has a value which is a linear function of a and a . The actual lineal, 
function can only be determined if the distribution function of the errors is 
known. In general terms, if the distribution function of the errors of Y is 
f y (z) dz, where z = Zja u , we have 

a" = a J o +c ° z»fy (z) dz + o' / * « (- *?fi (*) dz 

If the distribution of the errors is normal, f t («) dz will be of the form given 
in Section 7.3, with a = 1. The two integrals will in this case (as in any 
symmetrical distribution) be equal. Their values for any value of h can be 
obtained from existing tables*, those for b = 1*0, 1*25, 1*5, 1*75, 2*0 being 
0*3989, 0-4097, 0*4300, 0 * 4599 , 0*5 respectively. It must be emphasized, 
however, that the distribution of sampling errors is frequently not sufficiently 
normal for the use of these values to be justified. In such cases, also, the 
form of the distribution may be expected to change with change in the size of 

the sample. . 

With this more general loss function we require to minimize the function 


which will be minimum when 

/ h \i-bj2 

<C - w (c = c„ - *) 

This equation can easily be solved by trial and error, or directly if k can be 
neglected, as will be the case when all sampling fractions are small. The 
same general conclusion that the accuracy should be increased with a more 
accurate method of survey still holds. 

When V (Y) is a more complicated function of C (possibly only determined 
in numerical form) the minimum of the cost-plus-loss function can itself be 
determined by trial and error. 

* Tables of the Gamma function. 
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8.19 Concluding remarks 

, P recedm g sections give an indication of the ways in which the efficiencies 
of different sampling methods can be compared, and the techniques of 
determining the optimal sampling fractions, size of sample required for a given 
accuracy, etc. It has been further shown that the accuracy which should 

results 16 *^ ^ ^ ltSdf felated t0 the loSses resultin 8 from errors in the survey 

Since determination of the optimal accuracy from the expected losses due 
to errors demands knowledge of the loss-function, it will chiefly be of relevance 
when action m the economic sphere has to be based on the results of the 
survey. An error in the estimate of the yield of a crop, for instance, may 
require changes m an import programme, or may lead to wastage, and the 
• resultant additional costs may be assessable, at least roughly. The losses 
due to errors in estimates provided by surveys of the research and investigational 
type can scarcely be assessed. Indeed, it is usually impossible to give any 
quantitative estimate in monetary terms of the value of the information provided 
by such surveys The decision to undertake the survey, and the accuracy 
aimed at, must then be a matter of judgment on the part of those who require 
the information, and those who are concerned with the allocation of resources. 

ven if the optimal accuracy cannot be quantitatively determined, arbitrary 
decisions on the accuracy required should as far as possible be avoided. Before 
any decision as to accuracy is taken, estimates should be prepared of the costs 
of obtaining results of differing degrees of accuracy, and these estimates should 
be considered m relation to the purposes for which the results are required. 

Minimization of costs can of course be carried out whether or not a loss- 
unction is available. In this chapter we have only considered this minimization 
when a single quantity requires estimation. In most censuses and surveys 
such treatment would be an over-simplification. A number of quantities will 
require to be estimated, frequently for many domains of study. It may then 
be nec essary to carry out a more elaborate investigation, minimizing the cost 
or e ned accuracies of all the estimated quantities. Alternatively, if loss- 
unctions are available for all of these quantities, the combined cost-plus-loss 
function can be minimized. Frequently, however, one'of the quantities is of 
ominan importance, and the situation is such that when adequate accuracy 
is attained on this quantity the remaining quantities are determined with more 
than the required accuracy. In this case minimization can be conducted solely 
with reference to this quantity. y 

Many of the examples worked out in this chapter are based on very small 
amounts of data, and the conclusions reached on the relative efficiencies of 
methods, even in the particular circumstances of the chosen examples, 
must therefore be treated with reserve. These examples are, in fact, merely 
“ ded t0 jUustrate the computational procedures, and bring out the various 
points that have to be taken into account when making calculations of relative 
efficiencies, optimal sampling fractions and size of sample. They are in no 
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SoT" ,s general i„ve s , ig ,, io „ s int0 the relative effidency „ f ^ ^ 

,n ., ,mnd ; h “ n ° «'■>■«»« 

m the practical planning of surveys Tf th T and S1Ze of sam P Ie are required 

the optimal tbe total cost or the total nf Val . Ues ado P ted are somewhere near 
the minimum. ’ t0tal ° f costs -plus-losses, will be very near 

•iJlZLI2ZL d T:tdZ T derakiag 1 s " 5Cy by «* «* 

methods. As we have seen e h h 1 , base , exact planning of the sampling 
will enable future surveys on si^arn^t* 1 - 8 ? V6S provide inf °rmation which 
In surveys on planned ‘ 

m the planning is that information will be required 6 both^ t0 be kept ,n mind 
costs. Equally, if preliminary rough estimates ^tn ^ ™ nanceS and 011 
can be designed so as to nrovidf h ,. re re< l ulre d> pilot investigations 

Which to b L th,°pL“ g Pr S'a‘C sT"f ■' “ "" “ “ 0r “ tU> " « 

no. tavinffif “■”»“»? ”«*>»* depends 

relevant to the method! concerned Thu! thf “ , on J 18 ™® da,a "Weh are 
estimation of wheat yieldsTv stt I th ® al P d °t investigation on the 

™ of sufficient si* ,“ S “«‘» 8-12 

components of variance with all nprp C e ^“to-field and farm-to-farm 

the Survey of Fe“ r Practfcl ° n the other hand, 1" 

now been accumulldJ t r^n Kf “ V< 7 krgC amount of da ‘ a has 
components of variationofthefaSTf* * ^ 

new-arable field of each cron was take ressi ”S®> Slnc e only one old- and one 
as a defect in the planni^oTThis ^ This must be regarded 

had a pair of fields been taken for tbe^ 5 -’ W ^ COtdd have been remedied 
the farms. Incidentally lack of this^f 008 Cr ° PS ? n a Smad proportion of 
consideration of the question of the 7° orrnatl ° n has also prevented any 

“thT PraCtiCC fr ° m fidd t0 "stelrop. 1131 farmCrS ^ 

requires effidency of surve y methods 

The need for thorough in this ‘hapter. 

methods in different fircuZS f-ent sampling 

more will be made and reported in thf future S.I be , h ° ped that ma ny 
neglected because, once a survev h * mvestl g atl ons are often 

it could have been carried on/m be ® n . c °mpleted, the question of whether 
that survey is concerned One of °^ e efficientl y k f* eIy historical as far as 

of sample" censuses and su™tsTX"7^ the ^ory and practice 

is that permanent organizations—often parttf matched tT' T™ years 
institutes—have been set un in „ n i, ’ or attached to, research statistical 

have been actively engaged both in the nLn • C ° Un ^ le ®- These organizations 
covering various fields g of enauirv rl nln S a nd the execution of surveys 

*” '° ,h ' dM “- “ & ine.nsT^VctffigVta ^ 
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had a continuing interest in 

who have both the training and expen.nc. to £2 T„ parti the 

Further progress may be “Sloped areas will be likely 

problems drat artse m “iTSibaZ, wh£ more emttres which 

to receive very muchm ° re th " f in aI f d execution of surveys in these 
rjrde.a“e“ olf in to way 4. a body of experience be bull, up 
whth S relevant to the special problem, of such surveys. 
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Table A1 —Random numbers 
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34 

29 

78 

64 

56 

07 

82 

52 

42 

07 

44 

38 

15 

51 

00 

13 

42 

57 

60 

86 

32 

44 

09 

47 

27 

96 

54 

49 

17 

46 

09 

62 

90 

52 

84 

77 

27 

18 

18 

07 

92 

46 

44 

17 

16 

58 

09 

79 

83 

86 

19 

62 

06 

76 

50 

03 

10 

26 

62 

38 

97 

75 

84 

16 

07 

44 

99 

83 

11 

46 

32 

24 

20 

14 

85“ 

88 

45 

23 

42 

40 

64 

74 

82 

97 

77 

77 

81 

07 

45 

32 

14 

08 

32 

98 

94 

07 

72 

52 

36 

28 

19 

95 

50 

92 

26 

11 

97 

00 

56 

76 

31 

38 

80 

22 

02 

53 

53 

37 

85 

94 

35 

12 

83 

39 

50 

08 

30 

42 

34 

07 

96 

88 

54 

42 

06 

87 

98 

70 

29 

17 

12 

13 

40 

33 

20 

38 

26 

13 

89 

51 

03 

74 

17 

76 

37 

13 

04 

56 

62 

18 

37 

35 

96 

83 

50 

87 

75 

97 

12 

25 

93 

47 

70 

33 

24 

03 

54 

99 

49 

57 

22 

77 

88 

42 

95 

45 

72 

16 

64 

36 

16 

00 

04 

43 

18 

66 

79 

16 

08 

15 

04 

72 

33 

27 

14 

34 

09 

45 

59 

34 

68 

49 

12 

72 

Of 

34 

45 

31 

16 

93 

32 

43 

50 

27 

89 

87 

19 

20 

15 

37 

00 

49 

52 

85 

66 

60 

44 

68 

34 

30 

13 

70 

55 

74 

30 

77 

40 

44 

22 

78 

84 

26 

04 

33 

46 

09 

52 

74 

57 

25 

65 

76 

59 

29 

97 

68 

60 

71 

91 

38 

67 

54 

13 

58 

18 

24 

76 

27 

42 

37 

86 

53 

48 

55 

90 

65 

72 

96 

57 

69 

36 

10 

96 

46 

92 

42 

45 

00 

39 

68 

29 

61 

66 

37 

32 

20 

30 

77 

84 

57 

03 

29 

10 

45 

6 6 

04 

26 

29 

94 

98 

94 

24 

68 

49 

69 

10 

82 

53 

75 

91 

93 

30 

34 

25 

20 

57 

27 

16 

90 

82 

66 

59 

83 

62 

64 

11 

12 

67 

19 

00 

71 

74 

60 

47 

21 

29 

68 

11 

27 

94 

75 

06 

06 

09 

19 

74 

66 

02 

94 

37 

34 

02 

76 

70 

90 

30 

86 

35 

24 

10 

16 

20 

33 

32 

51 

26 

38 

79 

78 

45 

04 

91 

16 

92 

53 

56 

16 

38 

23 

16 

86 

38 

42 

38 

97 

01 

50 

87 

75 

66 

81 

41 

40 

01 

74 

91 

62 

31 

96 

25 

91 

47 

96 

44 

33 

49 

13 

34 

86 

82 

53 

91 

00 

52 

43 

■ 

48 

85 

66 

67 

40 

67 

14 

64 

05 

71 

95 

86 

11 

05 

65 

09 

68 

76 

83 

20 

37 

90 

14 

90 

84 

45 

11 

75 

73 

88 

05 

90 

52 

27 

41 

14 

86 

22 

98 

12 

22 

08 

68 

05 

51 

18 

00 

33 

96 

02 

75 

19 

07 

60 

62 

93 

55 

59 

33 

82 

43 

90 

20 

46 

78 

73 

90 

97 

51 

40 

14 

02 

04 

02 

33 

31 

08 

39 

54 

l6 

49 

36 

64 

19 

58 

97 

79 

15 

06 

15 

93 

20 

01 

90 

10 

75 

06 

40 

78 

78 

89 

62 

05 

26 

93 

70 

60 

22 

35 

85 

15 

13 

92 

03 

51 

59 

77 

59 

56 

78 

06 

83 

07 

97 

10 

• 88 

23 

09 

98 

42 

99 

64 

61 

71 

62 

99 

15 

06 

51 

29 

16 

93 

68 

71 

86 

- 85 

85 

54 

■ 87 

66 

47 

54 

73 

■ 32 

08 

11 

12 

44 

95 

92 

63 

16 

26 

. 99 

61 

65 

53 

58 

. 37 

78 

■ 80 

• 70 

42 

10 

> 50 

> 67 

42 

32 

17 

55 

85 

74 

14 

: 65 

- 52 

; 68 

75 

87 

59 

36 

■ 22 

41 

26 

■ 78 

63 

■ 06 

55 

13 

■ 08 

27 

01 

50 


This table forms part of a larger table of random numbers given in Statistical Tables for Biological , 
Agricultural and Medical Research by R. A. Fisher and F. Yates, Oliver & Boyd, Edinburgh (3rd edition, 
1948), and is reproduced by kind permission of the senior author and the publishers. 
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Table A2— The normal distribution 


Probability of obtaining deviations (positive or negative) greater 
than given multiples of the standard deviation 


Deviation 

z/g 

Probability 

P 

Deviation 

z/g 

Probability 

P 

Deviation 

z/g 

Probability 

P 

0-0 

1-0000 

1*0 

•3173 

2-0 

•0455 

0*1 

•9203 

M 

•2713 

2*1 

•0357 

0*2 

•8415 

1*2 

•2301 

2-2 

•0278 

0*3 

•7642 

1*3 

*1936 

2*3 

*0214 

0*4 

•6892 

1*4 

*1615 

2*4 

•0164 

0*5 

•6171 

1*5 

•1336 

. 2*5 

*0124 

0*6 

*5485 

1*6 

•1096 

2-6 

•0093 

‘ 0*7 

*4839 

1 *7 

•0891 

2*7 

•0069 

0*8 

. *4237 

1*8 

•0719 

2*8 

•0051 

0*9 

•3681 

1*9 

•0574 

2*9 

•0037 

1*0 

•3173 

2*0 

•0455 

3*0 

•0027 
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BIBLIOGRAPHY ON SAMPLING 


This bibliography has been drawn up by Mr. D. R- Read. It is based 
on a bibliography prepared by the Statistical Office of the United Nations 
though not all the references in the United Nations’ bibliography are included 

here. 


The papers have been classified under the following heads 

(A) Theory and methods. 

(B) Machine methods. 

(C) Population censuses. 

(D) Sociology, nutrition, health, etc. 

(E) Opinion surveys and market research. . 

(F) Economics: surveys of industry, censuses of production, labour 

force, etc. 

(G) Agricultural economics and farm practice. 

(H) Crop estimation and forecasting, etc. 

(I) Forestry and land utilization surveys. 

' (J) Estimation of wild populations. 


Since a single paper does not necessarily deal with only one subject, the 
subject classification must be taken as approximate only. A certain amount 
of general theory, for example, will be found in papers primarily dealing with 
special applications. In some instances where the original paper could not 
be consulted the classification has been made from the title and journal. Papers 
by the same author may be found under more than one heading, but papers 
by more than one author are indexed in the section concerned under the name 
of each author, so as to avoid difficulty in tracing all papers by a given author. 


BOOKS 

General 

Baehne G. W. (1935). “ Practical applications of the punched card method m 

colleges and universities.” New York : Columbia University Press. ,, 

Blankenship, A. (1943). “ How to conduct consumer, and opinion, research 

(2nd edn. 1945). New York : Harpers. _ 

Cantril H 1944). “ Gauging public opinion.” Princeton University Press. 

Schman; C. W., Ackoff, R. L„ and Wax, M (1947). “Measurement of consumer 

interest.” Philadelphia: University of Philadelphia Press. 

Fisher, R. A. (1925). “ Statistical methods for research workers (lOth.edn., 1946). 

Edinb urgh : ftf experiment s ” (4th edn., 1947). Edinburgh: 

H™k1Tp. (1942). “Principles of punch-card machine operation.” 

New York: Thomas Y. Crowell. ,, -rr T 

Kendall M. G. (1943, 1946). " The advanced theory of statistics. Vol. I (3rd 

edn., 1947) & Vol. II (2nd edn., 1948). London: Griffin Walwr _ ■ 

Peatman T. G. (1947). “ Descriptive and sampling statistics New York . Haro^s. 

Rhodes, E. C. (1933). “ Elementary statistical methods (8th edn., 1948). London . 
Routledge. 
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Bias 9-17, 143 ; permissible, 17 • 

estimation of, 239 ; in selection 9-1 5 ' 
65, 80 84, 165, 222, 240 ; in demarca¬ 
tion of units, 15, 164 ; in eye estimates 

7 ^ 7 1 7 63, i A 65 i 222 ; in estim ation,' 
16, 73, 77 145, 162, 174 ; in estimate 
of error, 198. 

Bibliography, 299-311. 

Biological sampling, 48, 236 

Blocks, city, 68 , 71. 

Blocks, randomized, 105. 

Blythe, R. H., 71. 


Calcutta Institute of Statistics, 83. 

Ca 222 ratl0n ° f eye estimates > 43, 88, 165* 

Cards, 104, 109 ; see also Cope-Chat cards 
ana punched cards. 

Causal relationship, 131 188 
Cells, 41. 

Census of Woodlands, 16, 44 46 83 QQ 
163, 220, 232, 238, 242, 257, 288.’ * 

Central Office of Information, 78. 

Change, see successive occasions 
Checks on field work, 106 ; on computa¬ 
tions, 124 ; by comparison with com¬ 
plete returns, 31, 144 . 

Chi-squared test, 22 , 28, 200 . 

Cluster sampling, 20 . 

Cochran, W. G., 270 273 
C°dmg, 109, 110, 111, 118-123, 126 
! Coefficient of variation, 184. 

Collator, 118. - 

Collective characteristics, 122 . 

Commercial undertakings, 79 . 
Comparability, 50. 

Complete census, 3 ; combination with 
sample, 47. 

Composite sampling scheme, 46. 
Compulsory returns, 59 . 

Computations, preliminary, 108, 117 123 * 
checks on, 124 ; see also analysis 
Constants, fitting of, 137. 

Consumer preferences, 79 *. ‘ 

Control, machine, 114. 

Control characters, 40. 

Cope-Chat cards, 104, 109, 110. 

Correlation coefficient, 176, 199 . 

Costs, 143 ; minimization of, 283-296 
Counter, 113. 

Counts, in analysis, 109, 110 , 111 113 115 

Covariance, 198. • * . 

Coverage, 49, 141. 

Cox, G., 138. 
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Crockery breakages, 55. 

Crop estimation, 42, 87-92 ; areas, 168, 
224 272 286, 289, 290 ; yields per 

acre, 165, 168, 222, 224, 264, 273, 286, 
289,' 291 ; sampling of standing crop, 
13/l5, 90, 99. 

Crop forecasting, 92. 

Cross footing multiplying punch, 117. 
Crowds, 43. 

Cruising, 42, 90. 


Deductions from surveys, qualifications 
of, 131, 212. 

Defective sample, adjustment for, 12J. 
Degrees of freedom, 185, 206. 

Deming, W. E., 65, 71, 127. 

Description of survey, 141. 

Distributors, 114. * 

Domains of study, 24, 28 ^ est *^ ates f< ’ 
146 ; errors for, 146, 202, 210. 
Duplicate samples, 242. 

Duplication in frame, 60. 

Dwellings, 67, 74. 


Eckler, A. R., 77. 

Economic institutions, frames fox, tv- 
Efficiency, 109, 144, 145, 200, 246-296■; 
definition, 247 ; see also under types of 
sample. 

Efficient estimate, 247. 

Election forecasts, 80. 

Electoral lists, 66, 71. 

Employment, 76. 

End corrections, 175. ,,, , 

England and Wales, see Census of Wood¬ 
lands, National Farm Survey, Survey of 
Fertilizer Practice. 

Error, sampling; see random sampling 
error and bias. 

,.d 

15 • in fieldwork, 106 ; in computations, 
124; rounding off, 238; grouping, 238, 
see also investigators, tests of. 
Estimates, alternative, 145. 

Estimation of population values 145-182 
rules for, 147; of sampling errors 
183-245 ■ of size of sample and relative 
efficiency, 94-99, 246-296; see also 

under types of sm ^PJ e - . „ 

Experiments, 105, 131, 212. 

Explanatory notes, 103. 

Exploratory surveys, 48, 99 

Eye estimates, calibration of, 43, 88, lb5. 


Factorial design, 105. 

Factories, 79. 

Factors, 131. 

Families, 121, 217. 

Family Census (U.K.), 63, 64, 130 
Family income (Norfolk-Portsmouth), 96, 

l 86 , 239 - 7 xt 4~f fr\ 

Farms, 74, 81, 273, 288, also Hertford¬ 
shire farms. 

Fertilizer Practice, see Survey ol. 

Fiducial probability, 191, 236. 

Field, of punched card, 113. 

Field work, organization of, 102-1UU, 
control of accuracy, 106. 

Fields, 81, 273, 288. ‘.L 6 

Finite sampling, corrections for, 187, 
Finney, D, J., 237. 

Fisher, R. A., 105, 201, 267. 

Fitting constants, 137. 

Fixed sample, 45. 

Flats, 68. 

Follow up, see non-response. 

Food offices, 64. ^ 

Forecasts, 80, 92. x 

Forestry, frames for, 83-87, see also 
Census of Woodlands. 

Forms, 57, 103-105. , AA 

Frame, 20, 60-87, 144 ; defects of, 60, 
144 • human populations, 02-ls , 
economic institutions, 79 ; agriculture, 
81-87- forestry, 83-87; construction 
of second-stage, 34 68, 71 : fr °® 

censuses, 65 ; from lists, 63, 66, 70, 
79 ; from maps, 68-75, 79, 81 86, 
from aerial photographs, 86. 

Frankel, L. R., 77. 

Functions, errors of, 196. 


Galvani, L., 40 . 

Gamma function, 293. 
Gang-punching, 116. 

Geoffrey, L., 127. 

Geographical scope, 49, 141. 
Gini, C., 40. 

Glass, D. V., 130. 

Greece, population census 71. 
Grouping, 118, 186, 188, 238. 


Half-open interval, 67, 68. 

Hansen, M. H., 36, 65 
Haphazard selection, 10. 

Hertfordshire farms, samples for wheat 
acreage 30, 36; random sample, 97, 
152, 159, 161, 162, 189, 205, 214, 21 , 
220 252 258; stratified sample, 150, 
203 249, 251; variable sampling 

\ • _ __ i ka 9A7 252 255. 256 , 
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samples of parishes, 169, 226, 264, 266; 
two-stage samples, 270, 271, 290. 
Hollerith, see punched cards. , 

^ Households, 78, 121, 217. 

* Houses, percentages defective, 149, 195. 
Hurwitz, W. N., 36. 


I.B.M., see punched cards. 

Inaccuracy, in frame, 60. 

Inadequacy, in frame, 60, 61, 78. 

Incomplete census, 2. 

Incomplete coverage, 10. 

Incomplete results, adjustment for, 129. 

Incompleteness, in frame, 60. 

Independence, 196. 

Independent samples, 45. 

Index numbers, calculation of, 117. 

India, Calcutta Institute of Statistics, 83. 

Industrial undertakings, 79. 

Information, description of, 141 ; required, 
51 s ; methods of collection, 57; 
practicability, 55. 

Insect populations, 44. 

Instructions, 103, 141. 

Integral values of supplementary variate, 
217. 

Interactions, 105, 140, 211. 

Interpenetrating samples, 44, 105, 107, 
143, 241, 242 ; examples, 83. 

Inter-relations between units, 54. 

Intra-class correlation, 267. 

Investigators, 58, 105 ; tests of, 44, 99, 
105, 107, 143, 241 ; instructions to, 
103 ; conditions of work, 106. 

Iowa State College Statistical Laboratory, 
73. 

Italy, population census, 40. 


Jessen, R. j., 71, 73. 


Kempthorne, O,, 71, 117. 
King, A. J., 73. 

Kiser, C. V., 11. 

Kraals, 159, 214. 


Land utilization surveys, 86. 

Limits of error, 191, 236. 

Line sampling, 42, 85, 86 ; errors, 229 ; 
examples, 232. 

Linear functions, standard errors of, 196. 


Listing, 114. 

Lists, see frame and systematic sample. 
Livestock, 81. 

Localized population survey, 75. 
Losses due to errors, 292. 


Mahalanobis, P. C., 83, 92. 

Maps, use as frames, 68-75, 79, 81-86 ; 
areas from, see point and line sampling. 

Marginal categories, 49. 

Mark sensing, 109. 

Market research, 79. 

Master cards, 116. 

Master sample, 65, 73, 75. 

Mathison, I., 81. 

Mean, arithmetic, 145 ; rule for estimation 
of, 148 ; geometric, 145 ; working, 185 ; 
correction for, 185. 

Mean square, 206. 

Mean square deviation, 183. 

Measurement, errors in, 15. 

Median, 145. 

Milk, composition of, 178, 181, 234, 235, 
262. 

Ministry of Agriculture, 82, 128, 158. 

Ministry of Home Security, 67. 

Morbidity, 12. 

Moving observer, 43. 

Multi-phase sample, 38 ; estimates, 157, 
159, 162 ; errors, 213, 219 ; size and 
efficiency, 258, 286. 

Multi-stage sample, 18, 34 ; estimates, 
170, 171 ; errors, 226; size and 

efficiency, 98, 268, 285 ; examples, 71, 
77, 81, 84. 

Multi-stage sample with uniform overall 
sampling fraction, 36, 148, 171, 278, 
287 ; examples, 71, 77 ; with adjust¬ 
ment of proportions of second-stage 
units, 78. 

Multiple classification, analysis of, 
131-141. 

Multiple punching, 119. 

Multiple stratification, 25, 254 ; examples, 
73, 77 ; without control of sub-strata, 
25. 

Multiplying punch, 117. 


National Agricultural Advisory Service, 
59. 

National Farm Survey, 115, 117, 128, 158. 
National Register, 64. 

Natural units, 20 ; hierarchy of, 121. 
Non-response, 59, 107, 130 ; sub-sample 
for, 108. 

Norfolk-Portsmouth, Virginia, 186. 
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Normal distribution, 190 ; sample from, 
149, 185, 188, 190, 192, 298. 

Normal law of error, 190. 

Notation, 7, 146. 

Nuffield Trust, 78. 


Observation, errors of, 15. 

Observers, see investigators. 

Opinion surveys, 79 ; effect of stratifica¬ 
tion, 248. 

Optimal values, 284. 

Optional allocation, 18, 29. 

Ordnance Survey, 75, 82, 83, 84. 
Orthogonality, 211. 

Out-of-date frame, 60. 

Overall estimates, 175. 


Partial replacement, see successive 
occasions. 

Patterson, H. D., 179, 180. 

Percentage standard deviation, 96, 184. 

Percentage standard error, 95, 184. 

Percentages, choice of, 108 ; calculation 
of, 117 ; estimation of, 148 ; standard 
error of, 94, 193. 

Personnel, 142. 

Phase, see multi-phase. 

Pilot surveys, 48, 99, 273. 

Planning of surveys, 48-101, 246, 294. 

Point sampling, 35, 69, 82, 86 ; estimates, 
167 ; errors, 224 ; size and efficiency, 
262, 286 ; examples, 272. 

Pooled estimate of error, 205, 236. 

Pooling of classes, 137. 

Population, human, 121, 217 ; frames for, 
62-78; localized surveys of, 75-78; 
special classes, 78. 

Population, statistical, 20 ; finite, 187 ; 
to be covered, 49, 141 ; values, see 
estimation. 

Population census, Greece, 71 ; Italy, 40 ; 
Southern Rhodesia, 159, 214 ; U.K., 

53 ; U.S.A., 65 ; frame from, 65. 

Postal enquiry, 58, 107, 130. 

Potato survey, 131-141, 199, 209. 

Powers-Samas, see punched cards. 

Precision, relative, 246-283 ; definition, 
247. 

Precoding, 120. 

Preliminary computations, see computa¬ 
tions,. 

Preliminary count (of dwellings), 68. 

Preliminary estimates, 85, 92. 

Printing, 113. 

Probability of selection proportional to 
size, sample with, 35, 36 ; errors, 224, 
225 ; estimates, 167, 169 ; size and 
efficiency, 262 ; examples, 36, 71, 77 ; 
see also point sampling. 


Progressive digiting, 115. 
Progressive totals, 115. 

Proportion, see percentages. 

Public opinion polls, 79. 

Punched cards, 109, 112-123, 126. 
Punching, 109, 112, 120, 126. 
Purpose of survey, 49, 51, 141. 
Purposive selection, 40, 80, 142. 


Qualitative variates, 94, 193, 232, 248 ; 

rule for, 148. 

Quality control, 48. 

Questionnaires, 52-59, 103-105; tests. 

of, 99, 104, 105; postal, 58, 107. 
Questions, wording of, 103. 

Quota method, 80, 142. 


Raising factor, 147 ; overall, 170. 

Random numbers, 21, 297. 

Random sample, 10, 21 ; estimates, 145, 
148, 152, 159, 162 ; errors, 183-196, 
212, 217, 218; size and efficiency, 94, 
248, 249, 256; examples, 31, 83. 

Random sampling error, 2, 9, 17 ; 

estimation of, 183-245 ; by sampling, 
238 ; from duplicate samples, 242 ; 
presentation of, 243 ; see also under 
types of sample. 

Random selection, 21 ; examples, 22. 

Random selection from areas, 22. 

Randomized blocks, 105. 

Rating offices, 66. 

Ratio, rule for estimation of, 148 ; 
standard error of, 198, 212. 

Ratio method, see supplementary informa¬ 
tion. 

Ration books, 64. 

Ratios, calculation of, 117. 

Read, D. R., 299. 

Regression coefficient, 155, 199 ; standard 
error of, 219 ; see also supplementary 
information and calibration of eye 
estimates. 

Rents, 158. 

Repeated surveys, 17, 79 ; see also- 

successive occasions. 

Reports, 141. 

Representative sample, 9, 84. 

Reproducing punch, 116. 

Response, failure of, see non-response. 

Rolling total tabulator, 114. 

Rounding off, 118, 238. 

Rowntree, B. Seebohm, 4. 

Royal Commission on Population, 64, 130. 
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Sample, types of, 20-47 ; also under 
separate types. 

Sample census, 2. 

Sample survey, 4. 

Sampling error, see random sampling 
error and bias. 

Sampling fraction, 18, 23, 24, 147, 148. 

Sampling process, 1 ; in censuses and 
surveys, 2-6; census, incomplete 
sample, 2 ; survey, sample, 4. 

Sampling units, 20 ; choice of, 19; multi¬ 
stage, 34 ; inter-relations between, 54 ; 
variation in size of, 19, 98, 279 ; rule 
festimation of number in population. 

Scoring, 147, 148, 195. 

Selection, methods of, 21, 142. 

Selection with probability proportional 
to size, see probability of selection 

Shaul, J. R. H., 160. 

Sheppard's correction, 239. 

Shops, 79. 

Sickness, 12. 

Size, probability proportional to, see 
probability of selection. 

Size of sample, determination of, 94-101, 
246-296 ; see also under types of 
sample. 

Size of strata, variation in, 98, 280. 

Size of unit, effect on sampling error, 19, 
98 ; variation in, 279. 

Snedecor, G., 105, 138. 

Social Survey, 78. 

Soil analysis, 82. 

Soil temperatures, 282. 

Solids-not-fat, see milk. 

Sorter, 113. 

Sorter-counter, 113. 

Sorting, 110. 

Southern Rhodesia, 159, 214. 

Stage, see multi-stage. 

Standard deviation, 96, 183, 190 ; per¬ 
centage, 96, 184. 

Standard error, 94, 183 ; percentage, 95, 
184 ; of qualitative variates, 94, 193 ' 
of mean, 96, 184, 187 ; of total, 96,' 
184, 187 ; of ratio, 198, 212 ; of 
multiple, 196; of product, 198; of 
sum, 197 ; of difference, 196 ; of linear , 
function, 196 ; of weighted mean, 197 ; 
of standard deviation, 192 ; effect of 
lack of independence, 198; see also 
under types of sample. 

Standardization, 157, 158, 159. 162 213 
219. 

Statistical analysis, see analysis. 

Statistician, functions of, 6, 49. 

Stephan, F. F., 65. 

Stevens, W. L., 138. 

Stock, J. S., 77. 

Stones, 12. 


Stratification after selection, 25, 32 152 
205. ’ 

Stratification, multiple, see multiple 
stratification. 

Stratified sample, variation in size of 
strata, 98, 280. * 

Stratified sample with one unit per 
stratum, 24, 78, 280. 

Stratified sample with uniform sampling 
fraction, 17^ 23, 146; estimates, 150, 
160, 164; errors, 201, 205, 215, 221; 
size and efficiency, 98, 248, 249, 256 ; 
examples, 31, 37. 

Stratified sample with variable sampling 
fraction, see variable sampling fraction. 

Streets, sampling by, 67, 69. 

SubXommission on Statistical Sampling, 

Sub-sample for non-response, 60; on 
successive occasions, 45; see also 
multi-phase sample. 

Sub-totals, 115. 

Substitution, 10, 108. 

Successive occasions, sampling on, 17, 45 ; 
estimates, 175, 179; errors, 233 ;'size 
and efficiency, 260 ; examples, 77. 

Sukhatme, P. V., 15, 255. 

Sum of squares, 185, 206 ; calculation of 
185. 

Summary punch, 116. 

Supervision, 19, 105. 

Supplementary information, 18, 32, 38, 
98, 145, 146 ; ratio method, 71, 155-162' 
171-174, 198, 212-218, 256 ; regression 
method, 155, 162-165, 171, 218-222, 
256 ; effects of errors in, 213. 

Survey, definition, 4. 

Survey of Fertilizer Practice, 57 81 111 
123, 171, 227, 240, 257, 264, 291. 

Syracuse, U.S.A., 11. 

Systematic sample from areas, 41 ; 
estimates, 174; errors, 229; size and 
efficiency, 282 ; examples, 83. 

Systematic sample from lists, 10, 29 ; 
estimates, 174 ; errors, 229 ; examples' 
64, 65, 67, 81. 


t DISTRIBUTION, 192. 

Tabulation, machine, 114. 

Tabulator, 113. 

Telephone enquiries, 80. 

Temperatures, soil, 282. 

Tepping, E. J., 127. 

Terminology, 7. 

Tests of questionnaires, 99, 104, 105 ; of 
investigators, 99, 105, 241 ; of 

significance, 188, 200. 

Thomas, G., 55. 

Timber, see Census of Woodlands. 
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Totals, computation of, 109, 110, 111, 
113 ; rule for estimation of, 147. 
Tracks, 70. 

Trailers, 116, 120. 

Training, 99, 105. 

Transformations, 236. 

Travelling, 19. 

Two-nine feature, 120. 

Two-way tables, analysis of, 131. 


U.S.A. employment estimates, 76; 
master sample, 73 ; population census, 
65 ; presidential election, 1948, 80. 

Undeveloped areas, 34, 42, 296 ; frames 
for, 70, 85-87. 

Unemployment, 76. 

Uniform sampling fraction, 23 ; overall, 
see multi-stage sample with uniform 
overall sampling fraction. 

Unit, natural, see natural units. 

Unit, sampling, see sampling units. 

United Kingdom, effects of air raids, 67 ; 
Family Census, 53, 64, 130 ; localized 
surveys, 76 ; Population Census, 53. 

United Nations, 141, 299. 


Variable sampling fraction, sample with, 
18, 28; estimates, 153, 161, 164; 


errors, 201, 205, 216, 221 ; size and 
efficiency, 98, 254, 256, 285; examples, 
31, 71, 81, 115, 128, 171. 

Variance, 183 ; unequal, 201, 207. 
Variate, 146. 

Variation, coefficient of, 184. 

Villages, 70. 


Weighted mean, 17 ; of sub-class means, 
134 ; of differences of sub-class means, 
136 ; standard error of, 197. 
Weighting factors, 108, 123. 

Wheat, 13, 15, 166, 223, 273 ; see also 
Hertfordshire farms. 

Wireworms, 236. 

World statistics, 51. 


Yates, F., 12, 13, 81, 138, 211, 237, 268, 
283. 

Yield per acre, see crop estimation and 
crop forecasting ; bias in, 16. 


Zacopanay, I., 268. 
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